Upon the identification of all no-production metabolites in the model the next step involves filling these gaps using minimally the three mechanisms described earlier. We first explore whether reaction directionality reversal and/or addition of reactions from Metacyc [18 (
link)] absent from the original model links the problem metabolite with the present substrates. This is accomplished by using a database of candidate reactions consisting of (i) all reactions in the original model with their directionalities reversed and (ii) reactions from a curated version of the MetaCyc database including allowable transport mechanism entries between compartments (in the case of multi-compartment models). It should be noted here that all the reactions in the MetaCyc database are treated as reversible in the model. These reactions define set
Database comprised of candidate reactions while set
Model is composed of the original genome-scale model reactions. It should be noted that if none of the above two/three mechanisms is capable of connecting the cytosolic no-production metabolite in single/multi-compartment models then an uptake reaction is arbitrarily added to the model to restore connectivity. However, if a non-cytosolic metabolite (in the case of multi-compartment models) present in an inner compartment cannot be fixed by any of the above mechanisms it is flagged as unfixable given the employed mechanisms.
In addition to the binary variable
wij defined previously, the proposed (GapFill) formulation relies on the binary variables
yj defined as follows:
For the case of single compartment models, the task of identifying the minimal set of additional reactions that enable the production of a no-production metabolite
i* is posed as the following mixed integer linear programming problem (GapFill).
s.t
In (GapFill), the objective function (13) minimizes the number of added reactions from the
Database so as to restore flow through metabolite
i*. Constraints (14) and (15) are identical to (6) and (7). Constraint (16) ensures that these additions are subject to a minimum of
δ units for the no-production metabolite
i* being produced. Constraint set (17), as in (GapFind), allows for the free drain of all cytosolic metabolites while bounds on reactions present in the
Model are imposed by constraint set (18). Constraint set (19) ensures that only those reactions from the
Database that have non zero flow are added to the model. This formulation restores flow through no-production metabolites in single compartment models. For multi-compartment models, the (GapFill) formulation is modified. First gaps in the cytosol are filled using the mechanisms described earlier for single compartment models. Specifically, the (GapFill) formulation is modified by replacing constraint (17) with constraints (11) and (12) reflecting the fact that no net production term can be imposed for metabolites present with compartments incapable of communicating directly with the extracellular space. The solution of formulation (GapFill) once for each no-production metabolite
i* identifies one mechanism at a time for resolving connectivity problems in the model. It should be noted that through the use of integer cuts [29 ] multiple hypotheses can be generated to resolve these connectivity problems. In this study, we evaluate the merit of generated hypotheses and subsequently choose the most probable one using the following three criteria sequentially a) The added hypotheses should not have cycles: since the MetaCyc database consists of multiple copies of the same reaction (which are present in different organisms), there is a proclivity to fix metabolites by adding two copies of the same reaction in opposite directions (since all reactions in the MetaCyc database are considered reversible) thereby forming a cycle, b) We choose the hypotheses which enables production of the problem metabolite with the least number of modifications and c) We choose a hypotheses that has higher probability of being accurate based on our validation metrics (e.g., if two reactions are added, we choose the one with the better blast score). Note that a GAMS implementation of (GapFill) is available as an additional file [see Additional file
4].