11. Gapfillling

Model gap filling is the task of figuring out which reactions have to be added to a model to make it feasible. Several such algorithms have been reported e.g. Kumar et al. 2009 and Reed et al. 2006. Cobrapy has a gap filling implementation that is very similar to that of Reed et al. where we use a mixed-integer linear program to figure out the smallest number of reactions that need to be added for a user-defined collection of reactions, i.e. a universal model. Briefly, the problem that we try to solve is

Minimize:

\[\sum_i c_i * z_i\]

subject to

\[Sv = 0\]
\[v^\star \geq t\]
\[l_i\leq v_i \leq u_i\]
\[v_i = 0 \textrm{ if } z_i = 0\]

Where l, u are lower and upper bounds for reaction i and z is an indicator variable that is zero if the reaction is not used and otherwise 1, c is a user-defined cost associated with using the ith reaction, \(v^\star\) is the flux of the objective and t a lower bound for that objective. To demonstrate, let’s take a model and remove some essential reactions from it.

[1]:
from cobra.io import load_model
from cobra.flux_analysis import gapfill
model = load_model("iYS1720")
Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled

In this model D-Fructose-6-phosphate is an essential metabolite. We will remove all the reactions using it, and at them to a separate model.

[2]:
universal = cobra.Model("universal_reactions")
for i in [i.id for i in model.metabolites.f6p_c.reactions]:
    reaction = model.reactions.get_by_id(i)
    universal.add_reaction(reaction.copy())
    model.remove_reactions([reaction])

Now, because of these gaps, the model won’t grow.

[3]:
model.optimize().objective_value
[3]:
0.0

We will use can use the model’s original objective, growth, to figure out which of the removed reactions are required for the model be feasible again. This is very similar to making the ‘no-growth but growth (NGG)’ predictions of Kumar et al. 2009.

[4]:
solution = gapfill(model, universal, demand_reactions=False)
for reaction in solution[0]:
    print(reaction.id)
GF6PTA
TALA

We can obtain multiple possible reaction sets by having the algorithm go through multiple iterations.

[5]:
result = gapfill(model, universal, demand_reactions=False, iterations=4)
for i, entries in enumerate(result):
    print("---- Run %d ----" % (i + 1))
    for e in entries:
        print(e.id)
---- Run 1 ----
GF6PTA
TALA
---- Run 2 ----
GF6PTA
TALA
---- Run 3 ----
TKT2
GF6PTA
FBP
---- Run 4 ----
GF6PTA
TALA

We can also instead of using the original objective, specify a given metabolite that we want the model to be able to produce.

[6]:
with model:
    model.objective = model.add_boundary(model.metabolites.f6p_c, type='demand')
    solution = gapfill(model, universal)
    for reaction in solution[0]:
        print(reaction.id)
F6PA

Finally, note that using mixed-integer linear programming is computationally quite expensive and for larger models you may want to consider alternative gap filling methods and reconstruction methods.