multiTFA Documentation¶
1. Getting Started¶
multiTFA is a python package for accurate thermodynamic flux analysis. multiTFA is built on COBRApy to take advantage of COBRA model structure. Installation instructions can be found in README.rst.
1.1. Building a multiTFA model¶
To create a multitfa model, we require
A COBRA model
Metabolite database identifiers, such as BIGG, KEGG, SEED etc. - This is required to match and retrieve thermodyanamic information against our database. Metabolite identifiers can be added to the multiTFA model later. For simplicity, we use the same nomneclature as equilibrator-api.
Compartment specific pH and ionic strength information - To calculate Gibbs energy related to transmembrane transport, information about membrane potential is also required in the form of pandas Dataframes.
Optionally, intracellular metabolite concentrations can be provided to calculate the transformed Gibbs energy of reactions. If metabolite concentrations are not available, a broad metabolite concentration range is assumed.
NOTE: All the input information should be in S.I units, e.g: ionic strength, metabolite concentration in Moles.
To demonstrate the usage, we use E. Coli core model from BIGG database.
First step in creating a multitfa model is to get a COBRA model. For this example, we load E. coli core model. This is available in Examples
directory.
[1]:
from cobra.io import load_model
# Load the Cobra model
cobra_model = load_model('e_coli_core')
Let us define the compartment properties such as pH, ionic strength (M) etc. For transport calculations, we will provide membrane potential in mV. For convenience, we use pandas DataFrame
to define these properties.
[2]:
# compartment specific pH and Ionic strength information. Ionic strength units should be in M.
pH_I_dict = {
"pH": {"c": 7.5, "e": 7, "p": 7},
"I": {"c": 0.25, "e": 0, "p": 0},
}
# Membrane potentials between different compartments. Units for membrane potential is mV.
del_psi_dict = {
"c": {"c":0,"e": 0, "p": 150},
"e": {"c": 0,"e":0, "p": 0},
"p": {"c": -150, "e": 0, "p":0},
}
import pandas as pd
# Define membrane potential and compartment information in Dataframes
del_psi = pd.DataFrame.from_dict(data=del_psi_dict)
comp_info = pd.DataFrame.from_dict(data=pH_I_dict)
Now, it is time to list any reactions that needs to be excluded from thermodynamic analysis. For example, Biomass, demand reactions etc.
[3]:
# List of reaction ids that you would like to exclude from thermodynamic analysis, here we exclude demand reactions
Excl = [
rxn.id
for rxn in cobra_model.reactions
if rxn.id.startswith("EX_") or rxn.id.startswith("DM_")
] + ["BIOMASS_Ecoli_core_w_GAM", "O2t", "H2Ot"]
Now, we have all the information we need. We can now create an instance of multitfa model.
Please note, when you use multitfa for the very first time, it downloads datafiles of about 3 GB. This is a one time thing only.
[4]:
# Now build the multiTFA model from COBRA model
from multitfa.core import tmodel
tfa_model = tmodel(
cobra_model, Exclude_list=Excl, compartment_info=comp_info, membrane_potential=del_psi
)
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
In order to access the thermodynamic information from equilibrator api, we need to add a external database identifier against all the metabolites in tfa model. This can be done by populating the ‘Kegg_id’ property of the metabolites. If you do not know the identifier for some metabolites, it is okay, you can simply add NA
against it. For simplicity, we use the notation similar to equilibrator api. For the example below in the E. coli core model, we use bigg identifiers.
You can follow equilibrator api notation for other database identifiers, for example, if you are using KEGG
identifier, simply use kegg:kegg identifier
.
[5]:
for met in tfa_model.metabolites:
kegg_id = 'bigg.metabolite:'+met.id[:-2]
met.Kegg_id = kegg_id
tfa_model.update()
Now that we have setup the model and populated the thermodynamic constraints, its better to check if constraints are added properly.
[6]:
print("Number of constraints before update {}\n".format(len(cobra_model.constraints)))
print("Number of reactions in the model {}\n".format(len(tfa_model.reactions)))
print("Number of reactions excluded from thermodynamic analysis {}\n".format(len(tfa_model.Exclude_reactions)))
print("Number of mass balance constraints {}\n".format(len(tfa_model.metabolites)))
num_cons = 2*3*(len(tfa_model.reactions) - len(tfa_model.Exclude_reactions)) + len(tfa_model.metabolites)
print("Number of constraints to be present after update {}\n".format(num_cons))
print("Number of constraints present after update {}\n".format(len(tfa_model.constraints)))
Number of constraints before update 72
Number of reactions in the model 95
Number of reactions excluded from thermodynamic analysis 23
Number of mass balance constraints 72
Number of constraints to be present after update 504
Number of constraints present after update 504
1.2 Metabolites¶
Reactions and metabolites in multiTFA
model can be accessed similar to a COBRA model. For demonstration, we will consider cytosolic ATP
as our metabolite with atp_c
identifier in t_model. We will access ATP
and print it’s standard Gibbs energy of formation and Also, change its intracellular concentration lower and upper bounds to ‘2e-3’ and ‘5e-2’ M respectively from it’s default bounds. These changes will reflect in solver variables and constraints.
[7]:
atp = tfa_model.metabolites.get_by_id('atp_c')
print("Standard Gibbs energy of formation of ATP is {} KJ/mol".format(atp.delG_f))
print("initial concentrations")
print('{}\t{}\t{}'.format(atp.id, atp.concentration_min, atp.concentration_max))
atp.concentration_min = 2e-3
atp.concentration_max = 5e-2
print("Concentrations after modifying")
print('{}\t{}\t{}'.format(atp.id, atp.concentration_min, atp.concentration_max))
Standard Gibbs energy of formation of ATP is -2259.1882733696866 KJ/mol
initial concentrations
atp_c 1e-05 0.02
Selected objective sense: MINIMIZE
Selected objective name: f3d52b45-5dba-11eb-bac2-c1c11d7a5b17
Selected RHS name: rhs
Selected bound name: bnd
Concentrations after modifying
atp_c 0.002 0.05
You can always change the Gibbs energy of formation value of a metabolite by resetting the delG_f
property of the metabolite. Also, you can access additional information like, major pseudoisomer at given compartment conditions (major_ms
), equilibrator accession of the metabolite.
1.3 Reactions¶
Reactions in multiTFA model are of type cobra.DictList. Similar to COBRA model, reactions can be accessed/retrieved by its index or by it’s name and properties/attributes can be accessed. Let us consider ATP synthase raction in our model by ‘ATPS4rpp’ and access it’s attributes.
[8]:
atps = tfa_model.reactions.get_by_id('ATPS4r')
print(atps.reaction)
adp_c + 4.0 h_e + pi_c <=> atp_c + h2o_c + 3.0 h_c
Access and change bounds of atps
[9]:
print("lower bound is {} and upper bound is {}\n".format(atps.lower_bound, atps.upper_bound))
atps.lower_bound = 0
print("updated lower and upper bounds are {} \t {}\n".format(atps.lower_bound, atps.upper_bound))
lower bound is -1000.0 and upper bound is 1000.0
updated lower and upper bounds are 0 1000.0
We can access the Gibbs energy of transport and transformed Gibbs energy of the reaction
[10]:
print("Gibbs energy of transport is {} KJ/mol".format(atps.delG_transport))
print("Transform Gibbs energy of is {} KJ/mol".format(atps.delG_prime))
Gibbs energy of transport is -4.605170185988092 KJ/mol
Transform Gibbs energy of is 32.39600409402766 KJ/mol
You can always reset the transport and transformed Gibbs energy of reactions. You can also access the stoichiometric matrix of the reaction, all the thermodynamic constraints associated with the reaction. Please refer to api for all the attributes of the model.
2. Solvers¶
multitfa model offers two different ways to solve the tMFA problem (Please refer to manuscript),
Univariate type, where each group/component is allowed to vary in 95% confidence interval from respective mean. This is formulated as a mixed integer linear programming problem (MILP)
Multivariate, where group/component are drawn from multivariate normal distribution subjected to the associated linear constraints. This is a mixed integer quadratic constraint program (MIQCP). In general, MIQCP are computationally expensive.
multitfa supports multiple solvers using optlang
. We use this to solve the MILP problems mentioned above. Unfortunately, optlang
doesnt support quadratic constraints yet, so we use Cplex
and Gurobi
to solve the MIQC type problems.
Lets see how to use different solvers with E. coli core model example
[1]:
import sys
sys.path.insert(0, "../../examples")
from paper_data_example import build_core_model
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
[2]:
tfa_model = build_core_model()
We can check which solver tfa_model
using now
[3]:
print(tfa_model.solver.interface)
<module 'optlang.cplex_interface' from '/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/optlang/cplex_interface.py'>
Now lets change to cplex
. Changing the solver is as simple as
[4]:
tfa_model.solver = 'cplex'
print(tfa_model.solver.interface)
<module 'optlang.cplex_interface' from '/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/optlang/cplex_interface.py'>
tfa_model
has two separate solver interfaces as properties.
cplex_interface
gurobi_interface
Depending on what solver is installed in your environment and what solver you have selected (as shown in previous section), one of the two interfaces above will be active. These are the Model
containers of respective solvers. You can access them the following way,
[5]:
tfa_model.cplex_interface
Selected objective sense: MINIMIZE
Selected objective name: aeb0183b-5dba-11eb-bac2-c1c11d7a5b17
Selected RHS name: rhs
Selected bound name: bnd
[5]:
<cplex.Cplex at 0x7f05ee229c88>
2.1 Solving a multiTFA model¶
Once, we populate the model with thermodynamic constraints as described in section 1.1
, it is time to solve the model. We demonstrate how to solve the model using both univariate and multivariate methods.
First lets solve using univariate method, lets call it box
method.
[6]:
solution_box = tfa_model.optimize(solve_method='mip') # Solve using MILP method
print(solution_box)
<Solution 0.874 at 0x7f05ef657ba8>
Solution for the optimization is stored in the solution_box
, which is instance of the Solution
class. You can access the attributes like fluxes
, Gibbs_energies
and metabolite_concentrations
.
[7]:
print(solution_box.fluxes,"\n")
print(solution_box.Gibbs_energies)
PFK 7.477382
PFL 0.000000
PGI 4.860861
PGK -16.023526
PGL 4.959985
...
NADH16 38.534610
NADTRHD 0.000000
NH4t 4.765319
O2t 21.799493
PDH 9.282533
Name: fluxes, Length: 95, dtype: float64
dG_err_glc__D_e -1.565741
dG_err_gln__L_c -2.667186
dG_err_gln__L_e -2.667186
dG_err_glu__L_c -2.189853
dG_err_glu__L_e -2.189853
...
dG_NADTRHD_reverse_49725 0.368593
dG_NH4t 9.161904
dG_NH4t_reverse_551ee -9.161904
dG_PDH -6.724673
dG_PDH_reverse_ca160 6.724673
Name: Gibbs_energies, Length: 216, dtype: float64
Now, lets solve the model with MIQC problem with cplex interface.
[8]:
solution_miqc = tfa_model.optimize() # In this case we are solving the MIQCP
print(solution_miqc)
<Solution -0.874 at 0x7f05ef657a20>
Attributes of solution_miqc
can be accessed as described above.
Please note, there is a bug in cplex
that reads the model objective in reverse when creating a new model. For example,
max a
is read as min -a
So, when reading cplex_interface
solution please reverse the sign.
3. Thermodynamic variability analysis (TVA)¶
There exists multiple flux states that define the same optimum. TVA predicts the ranges of variables such as metabolic fluxes, Gibbs energy of reactions and metabolite concentrations by taking into account of thermodynamic uncertainities using multivariate approach.
As described previously, you can solve the TVA problem on either the MILP or MIQCP. Although, MILP is faster compared to solving MIQCP, MIQCP is thermodynamically consistent as they drawn from multivariate distribution.
There are three different functions for running TVA in multiTFA. One for the MILP, the remaining two for MIQCP, depending on the solver you use (either Gurobi or Cplex).
By default, TVA is performed on all the variables in a model unless variable_list
parameter is specified. It is a list
of variable names you want to perform TVA on. For example reactons id’s or names of Gibbs energy variables etc.
The example below demonstrates how to perform TVA on E. coli core model with constraints described in previous sections. We will use cplex
for our calculations. In our observations Cplex
ourperforms Gurobi
.
[1]:
import sys
sys.path.insert(0, "../../examples")
from paper_data_example import build_core_model
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
[2]:
tfa_model = build_core_model()
from multitfa import analysis
dg_vars = [rxn.delG_forward.name for rxn in tfa_model.reactions if rxn.id not in tfa_model.Exclude_reactions] # Gibbs energy variable names
Now let us perform TVA on tfa_model
using univariate approch and calculate ranges on reaction flux variables
[3]:
tfa_model.solver = 'cplex'
ranges_box = analysis.variability(tfa_model, variable_list = dg_vars) # TVA using MILP
print(ranges_box)
minimum maximum
dG_PFK -65.527361 26.205939
dG_PFL -72.197674 33.074152
dG_PGI -19.085713 24.931203
dG_PGK 0.000000 59.219274
dG_PGL -40.671665 28.293843
... ... ...
dG_ME2 -70.983610 57.688660
dG_NADH16 -177.843571 -61.742679
dG_NADTRHD -58.319625 31.641641
dG_NH4t -13.742207 32.066015
dG_PDH -116.749219 31.763632
[72 rows x 2 columns]
ranges_box
is a pandas DataFrame
which can be readily exported to a csv.
[4]:
ranges_box.to_csv('ranges.csv')
Now lets solve the TVA with multivariate approach using MIQCP. There are two functions one when using Gurobi
and another for Cplex
.
[5]:
ranges_MIQC_cplex = analysis.variability_legacy_cplex(tfa_model,variable_list = dg_vars) # TVA using MIQCP with cplex
print(ranges_MIQC_cplex)
Selected objective sense: MINIMIZE
Selected objective name: 510d6206-5dba-11eb-bac2-c1c11d7a5b17
Selected RHS name: rhs
Selected bound name: bnd
Selected objective sense: MINIMIZE
Selected objective name: 510d6206-5dba-11eb-bac2-c1c11d7a5b17
Selected RHS name: rhs
Selected bound name: bnd
minimum maximum
dG_PFK -6.349569e+01 24.174265
dG_PFL -7.084883e+01 31.725309
dG_PGI -1.957310e+01 25.255346
dG_PGK 4.296464e-09 56.021187
dG_PGL -4.961171e+01 37.233887
... ... ...
dG_ME2 -8.241713e+01 69.129417
dG_NADH16 -2.107758e+02 -33.736290
dG_NADTRHD -4.100270e+01 23.163226
dG_NH4t -9.670293e+00 27.994102
dG_PDH -1.200410e+02 31.809464
[72 rows x 2 columns]
Similarly, if you need to use Gurobi
, you should use variability_legacy_gurobi
function. Optionally, if you wish to warm start the optimization, you should specify warm_start
parameter.
Please note, runnig TVA on genome scale model is computationally expensive, especially using MIQC problems. We have observed that relaxing the Gap
parameter helped achieving faster run times.
3.1 Alternative way to solve MIQC¶
When Gurobi
or Cplex
is not available, and if you wanted to run MIQC problem, we developed a sampling based approach to solve the MIQCP. We sample the uncertainity variables on the surface of the ellipsoid and solve the subsequent MILP problem. This can be solved using MILP solvers. The exit criterion of the sampler can be chosen as either
the number of samples since last improvement or
a fixed number of samples followed by use of a generalized extreme value distribution to infer the maximum value.
[6]:
gev_ranges, gev_samples = analysis.gev_sampling(tfa_model, variable_list=dg_vars, cutoff = 100)
/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:400: RuntimeWarning: invalid value encountered in double_scalars
return m3 / np.power(m2, 1.5)
/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:2666: RuntimeWarning: invalid value encountered in subtract
-pex2+logpex2-logex2)
[7]:
print(gev_samples)
minimum maximum minimum maximum minimum \
dG_PFK 0.0 6.950160e-310 -57.126492 18.202297 -56.355503
dG_PFL 0.0 6.950167e-310 -57.487150 17.841639 -56.888039
dG_PGI 0.0 6.950162e-310 -15.744384 21.920011 -15.992455
dG_PGK 0.0 6.950167e-310 0.000000 50.801699 0.000000
dG_PGL 0.0 6.950167e-310 -33.529149 22.967443 -33.880045
... ... ... ... ... ...
dG_ME2 0.0 6.950236e-310 -48.030535 46.130452 -57.819277
dG_NADH16 0.0 6.950166e-310 -165.244331 -101.608240 -157.179347
dG_NADTRHD 0.0 6.950166e-310 -37.952069 16.907125 -37.511270
dG_NH4t 0.0 6.950236e-310 -9.670293 27.994102 -9.670293
dG_PDH 0.0 6.950236e-310 -87.076334 14.224151 -92.323210
maximum minimum maximum minimum maximum ... \
dG_PFK 18.973286 -58.369161 16.959629 -57.249094 18.079695 ...
dG_PFL 18.440751 -57.556992 17.771798 -57.588522 17.740267 ...
dG_PGI 21.671940 -16.329137 21.335258 -15.801123 21.863272 ...
dG_PGK 50.528925 0.000000 50.929908 0.000000 51.194993 ...
dG_PGL 22.616547 -32.791313 23.705279 -33.802262 22.694330 ...
... ... ... ... ... ... ...
dG_ME2 36.341710 -44.943975 49.217011 -49.060255 45.100732 ...
dG_NADH16 -93.221033 -146.633223 -83.709714 -154.574690 -91.433939 ...
dG_NADTRHD 18.001545 -38.347183 15.548170 -38.247234 16.037089 ...
dG_NH4t 27.994102 -9.670293 27.994102 -9.670293 27.994102 ...
dG_PDH 9.299499 -79.044267 20.157648 -89.783288 11.021858 ...
minimum maximum minimum maximum minimum \
dG_PFK -57.361601 17.967189 -57.003212 18.325577 -57.404395
dG_PFL -57.450801 17.877988 -59.125999 16.202791 -57.345710
dG_PGI -15.155035 22.509359 -15.955762 21.708633 -16.048769
dG_PGK 0.000000 50.182441 0.000000 50.874615 0.000000
dG_PGL -38.775775 17.720817 -37.063765 19.432827 -35.456340
... ... ... ... ... ...
dG_ME2 -46.149320 48.011667 -55.544344 38.616642 -57.539017
dG_NADH16 -147.696089 -84.412259 -149.651005 -85.956494 -161.408033
dG_NADTRHD -37.990377 16.290842 -37.853004 17.440327 -37.910998
dG_NH4t -9.670293 27.994102 -9.670293 27.994102 -9.670293
dG_PDH -84.435829 14.996814 -91.232542 10.126363 -93.275548
maximum minimum maximum minimum maximum
dG_PFK 17.924394 -57.339358 17.989432 -57.041507 18.287283
dG_PFL 17.983080 -54.138492 21.190297 -55.180897 20.147892
dG_PGI 21.615625 -16.099379 21.565016 -15.495405 22.168989
dG_PGK 50.131498 0.000000 51.501463 0.000000 50.647421
dG_PGL 21.040252 -36.490842 20.005750 -35.674083 20.822509
... ... ... ... ... ...
dG_ME2 36.621969 -56.770878 37.390109 -57.634138 36.526849
dG_NADH16 -98.454626 -148.205144 -85.254659 -154.115709 -90.590310
dG_NADTRHD 16.392663 -37.467504 17.599419 -37.682847 17.236616
dG_NH4t 27.994102 -9.670293 27.994102 -9.670293 27.994102
dG_PDH 7.342254 -93.586076 7.028803 -94.904108 6.285686
[72 rows x 202 columns]
gev_samples
is the alternate set of Gibbs energy of reactions and gev_ranges
is the max-min ranges of Gibbs energy of reactions. By specifying cutoff
parameter, we specify how many samples we use to train the extreme value distribution (default, 1000). The more samples the better.
Alternatively, you could choose to use the other exit crieteria where if we don’t see improvement since a defined number of samples.
[8]:
cutoff_ranges, cutoff_samples, num_samples = analysis.cutoff_sampling(tfa_model, cutoff=10, variable_list=dg_vars)
Here, cutoff_ranges
, cutoff_samples
, num_samples
represents max-min Gibbs energy ranges of reactions, alternate Gibbs energy of reactions and num of samples to achieve the exit.
[9]:
print(cutoff_samples)
minimum maximum minimum maximum minimum maximum \
dG_PFK 0.0 19.994462 -57.381509 17.947280 -58.071537 17.257252
dG_PFL 0.0 21.678685 -60.857036 14.471753 -58.668320 16.660470
dG_PGI 0.0 22.843247 -16.019409 21.644985 -15.831312 21.833083
dG_PGK 0.0 52.777768 0.000000 51.082344 0.000000 51.522022
dG_PGL 0.0 25.614162 -36.254780 20.241812 -35.437215 21.059377
... ... ... ... ... ... ...
dG_ME2 0.0 50.162726 -53.493241 40.667746 -46.053466 48.107521
dG_NADH16 0.0 -50.696450 -161.222356 -98.560476 -146.618738 -84.279283
dG_NADTRHD 0.0 19.027818 -38.046908 15.441167 -37.461931 16.860187
dG_NH4t 0.0 27.993973 -9.670293 27.994102 -9.670293 27.994102
dG_PDH 0.0 19.494280 -93.291169 7.035106 -81.992453 17.703287
minimum maximum minimum maximum ... minimum \
dG_PFK -57.041093 18.287697 -57.075243 18.253547 ... -56.378078
dG_PFL -57.517947 17.810842 -53.692094 21.636696 ... -58.208255
dG_PGI -15.754052 21.910342 -16.727598 20.936797 ... -16.119889
dG_PGK 0.000000 50.589288 0.000000 51.062350 ... 0.000000
dG_PGL -32.623016 23.873576 -31.511479 24.985113 ... -31.982957
... ... ... ... ... ... ...
dG_ME2 -54.088576 40.072410 -54.682189 39.478797 ... -56.093864
dG_NADH16 -149.744292 -85.730297 -163.037216 -99.770295 ... -159.546298
dG_NADTRHD -37.532229 17.851851 -38.441989 16.392084 ... -37.486707
dG_NH4t -9.670293 27.994102 -9.670293 27.994102 ... -9.670293
dG_PDH -88.726706 12.951684 -89.362655 11.568661 ... -94.402224
maximum minimum maximum minimum maximum \
dG_PFK 18.950711 -56.928126 18.400663 -58.604627 16.724162
dG_PFL 17.120534 -57.489656 17.839133 -56.151032 19.177758
dG_PGI 21.544506 -15.870513 21.793882 -16.476337 21.188058
dG_PGK 50.387229 0.000000 49.334484 0.000000 50.463590
dG_PGL 24.513635 -37.281422 19.215170 -36.178770 20.317822
... ... ... ... ... ...
dG_ME2 38.067123 -49.907013 44.253974 -50.748066 43.412921
dG_NADH16 -95.548451 -157.436641 -93.708334 -159.538517 -95.410680
dG_NADTRHD 17.803702 -37.723270 17.408852 -37.521470 17.786028
dG_NH4t 27.994102 -9.670293 27.994102 -9.670293 27.994102
dG_PDH 7.260018 -85.238851 15.998135 -85.434103 16.358129
minimum maximum minimum maximum
dG_PFK -57.907485 17.421305 -56.873109 18.455681
dG_PFL -56.184370 19.144420 -60.974034 14.354756
dG_PGI -15.347801 22.316594 -16.130546 21.533848
dG_PGK 0.000000 50.097605 0.000000 50.037495
dG_PGL -36.129072 20.367520 -32.519994 23.976598
... ... ... ... ...
dG_ME2 -52.093254 42.067733 -51.961487 42.199500
dG_NADH16 -151.118261 -87.355727 -158.860935 -95.815908
dG_NADTRHD -38.440229 17.719414 -37.858607 17.706442
dG_NH4t -9.670293 27.994102 -9.670293 27.994102
dG_PDH -88.995090 12.431838 -89.969652 10.739770
[72 rows x 24 columns]
3.2 Network embeded thermodynamics (NET)¶
By using the TVA apparoches demonstrated above, one can perform NET type of analysis. For example, we can fix the directionalities of all the reactions and sample for the metabolite concentrations.
Reaction directionalities can be fixed by adjusting the lower or upper bounds of the reactions.
[10]:
atps = tfa_model.reactions.get_by_id('ATPS4r')
print("lower bound is {} and upper bound is {}\n".format(atps.lower_bound, atps.upper_bound))
atps.lower_bound = 0
print("updated lower and upper bounds are {} \t {}\n".format(atps.lower_bound, atps.upper_bound))
lower bound is -1000.0 and upper bound is 1000.0
updated lower and upper bounds are 0 1000.0
We can access the metabolite concentration variables as shown below,
[11]:
concentration_vars = [var.name for var in tfa_model.variables if var.name.startswith('lnc_')]
tfa_model.solver = 'cplex'
concentration_ranges = analysis.variability_legacy_cplex(tfa_model,variable_list = concentration_vars) # TVA using MIQCP with cplex
print(concentration_ranges)
Selected objective sense: MINIMIZE
Selected objective name: 510d6206-5dba-11eb-bac2-c1c11d7a5b17
Selected RHS name: rhs
Selected bound name: bnd
minimum maximum
lnc_glc__D_e -11.512925 -3.912023
lnc_gln__L_c -11.512925 -3.912023
lnc_gln__L_e -11.512925 -3.912023
lnc_glu__L_c -11.512925 -3.912023
lnc_glu__L_e -11.512925 -3.912023
... ... ...
lnc_fru_e -11.512925 -3.912023
lnc_fum_c -11.512925 -3.912023
lnc_fum_e -11.512925 -3.912023
lnc_g3p_c -11.512925 -3.912023
lnc_g6p_c -11.512925 -3.912023
[72 rows x 2 columns]
4. Debugging¶
It is often observed in genome scale models that adding thermodynamic constraints make the model infeasible. This might be due to variety of reasons, like ill-formatted covariance between groups/components, lumping of reactions etc. It is always not easy to find the exact cause of the problem. But, if you are using Gurobi/Cplex, you can find the constraints that render the model infeasible.
Please note that these methods give you random set of constraints that represent infeasibility. Also, a model can contain multiple infeasible sets. While modifying/removing these constraints, it is advisable to look for biological meaning.
Let us demonstrate the usage by breaking the model.
[1]:
import sys
sys.path.insert(0, "../../examples")
from paper_data_example import build_core_model
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
Downloading package metadata...
Fragments already downloaded
[2]:
tfa_model = build_core_model()
Using license file /home/vishnu/gurobi.lic
Academic license - for non-commercial use only
Read LP format model from file /tmp/tmpdzvydb7y.lp
Reading time = 0.00 seconds
: 72 rows, 190 columns, 720 nonzeros
Now lets change the concentration atp and find the objective value. Then, we set the lower bound of objective variable higher than the solution to break the model.
[3]:
atp = tfa_model.metabolites.get_by_id('atp_c')
atp.concentration_min = 2e-3
atp.concentration_max = 5e-2
solution = tfa_model.optimize()
print(solution)
<Solution 0.374 at 0x7f3c6ad4dda0>
Now lets adjust the biomass lower bound to 0.5 and find out the IIS set of reactions using Gurobi
[4]:
biomass = tfa_model.reactions.get_by_id('BIOMASS_Ecoli_core_w_GAM')
biomass.lower_bound = 0.5
solution = tfa_model.optimize()
tfa_model.solver.problem.computeIIS() # model.solver.problem represents the solver interface model
for cons in tfa_model.solver.problem.getConstrs():
if cons.IISConstr:
print(cons)
<gurobi.Constr glc__D_e>
<gurobi.Constr gln__L_c>
<gurobi.Constr gln__L_e>
<gurobi.Constr glu__L_c>
<gurobi.Constr glu__L_e>
<gurobi.Constr icit_c>
<gurobi.Constr lac__D_c>
<gurobi.Constr lac__D_e>
<gurobi.Constr mal__L_c>
<gurobi.Constr mal__L_e>
<gurobi.Constr 13dpg_c>
<gurobi.Constr 2pg_c>
<gurobi.Constr 3pg_c>
<gurobi.Constr oaa_c>
<gurobi.Constr pep_c>
<gurobi.Constr 6pgc_c>
<gurobi.Constr 6pgl_c>
<gurobi.Constr pyr_c>
<gurobi.Constr pyr_e>
<gurobi.Constr r5p_c>
<gurobi.Constr ru5p__D_c>
<gurobi.Constr acald_c>
<gurobi.Constr s7p_c>
<gurobi.Constr acald_e>
<gurobi.Constr accoa_c>
<gurobi.Constr succ_c>
<gurobi.Constr succ_e>
<gurobi.Constr succoa_c>
<gurobi.Constr acon_C_c>
<gurobi.Constr xu5p__D_c>
<gurobi.Constr actp_c>
<gurobi.Constr adp_c>
<gurobi.Constr akg_c>
<gurobi.Constr akg_e>
<gurobi.Constr amp_c>
<gurobi.Constr cit_c>
<gurobi.Constr dhap_c>
<gurobi.Constr e4p_c>
<gurobi.Constr etoh_c>
<gurobi.Constr etoh_e>
<gurobi.Constr f6p_c>
<gurobi.Constr fdp_c>
<gurobi.Constr fru_e>
<gurobi.Constr fum_c>
<gurobi.Constr fum_e>
<gurobi.Constr g3p_c>
<gurobi.Constr g6p_c>
<gurobi.Constr directionality_ACONTb>
<gurobi.Constr ind_ACONTb>
<gurobi.Constr delG_ACONTb>
<gurobi.Constr directionality_ATPS4r>
<gurobi.Constr ind_ATPS4r>
<gurobi.Constr delG_ATPS4r>
This returned us a list of constraints that can be modified to make the model status optimal. One has to choose and modify the list of constraints to manipulate based on biological meaning.