{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Thermodynamic variability analysis (TVA)\n",
    "\n",
    "There exists multiple flux states that define the same optimum. TVA predicts the ranges of variables such as metabolic fluxes, Gibbs energy of reactions and metabolite concentrations by taking into account of thermodynamic uncertainities using multivariate approach.\n",
    "\n",
    "As described previously, you can solve the TVA problem on either the MILP or MIQCP. Although, MILP is faster compared to solving MIQCP, MIQCP is thermodynamically consistent as they drawn from multivariate distribution. \n",
    "\n",
    "There are three different functions for running TVA in multiTFA. One for the MILP, the remaining two for MIQCP, depending on the solver you use (either Gurobi or Cplex).\n",
    "\n",
    "By default, TVA is performed on all the variables in a model unless `variable_list` parameter is specified. It is a `list` of variable names you want to perform TVA on. For example reactons id's or names of Gibbs energy variables etc.\n",
    "\n",
    "The example below demonstrates how to perform TVA on *E. coli* core model with constraints described in previous sections. We will use `cplex` for our calculations. In our observations `Cplex` ourperforms `Gurobi`. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading package metadata...\n",
      "Fragments already downloaded\n",
      "Downloading package metadata...\n",
      "Fragments already downloaded\n",
      "Downloading package metadata...\n",
      "Fragments already downloaded\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "sys.path.insert(0, \"../../examples\")\n",
    "from paper_data_example import build_core_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "tfa_model = build_core_model()\n",
    "\n",
    "from multitfa import analysis\n",
    "dg_vars = [rxn.delG_forward.name for rxn in tfa_model.reactions if rxn.id not in tfa_model.Exclude_reactions] # Gibbs energy variable names\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let us perform TVA on `tfa_model` using univariate approch and calculate ranges on reaction flux variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "               minimum    maximum\n",
      "dG_PFK      -65.527361  26.205939\n",
      "dG_PFL      -72.197674  33.074152\n",
      "dG_PGI      -19.085713  24.931203\n",
      "dG_PGK        0.000000  59.219274\n",
      "dG_PGL      -40.671665  28.293843\n",
      "...                ...        ...\n",
      "dG_ME2      -70.983610  57.688660\n",
      "dG_NADH16  -177.843571 -61.742679\n",
      "dG_NADTRHD  -58.319625  31.641641\n",
      "dG_NH4t     -13.742207  32.066015\n",
      "dG_PDH     -116.749219  31.763632\n",
      "\n",
      "[72 rows x 2 columns]\n"
     ]
    }
   ],
   "source": [
    "tfa_model.solver = 'cplex'\n",
    "ranges_box = analysis.variability(tfa_model, variable_list = dg_vars) # TVA using MILP\n",
    "print(ranges_box)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " `ranges_box` is a `pandas DataFrame` which can be readily exported to a csv. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ranges_box.to_csv('ranges.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets solve the TVA with multivariate approach using MIQCP. There are two functions one when using `Gurobi` and another for `Cplex`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Selected objective sense:  MINIMIZE\n",
      "Selected objective  name:  510d6206-5dba-11eb-bac2-c1c11d7a5b17\n",
      "Selected RHS        name:  rhs\n",
      "Selected bound      name:  bnd\n",
      "\n",
      "Selected objective sense:  MINIMIZE\n",
      "Selected objective  name:  510d6206-5dba-11eb-bac2-c1c11d7a5b17\n",
      "Selected RHS        name:  rhs\n",
      "Selected bound      name:  bnd\n",
      "                 minimum    maximum\n",
      "dG_PFK     -6.349569e+01  24.174265\n",
      "dG_PFL     -7.084883e+01  31.725309\n",
      "dG_PGI     -1.957310e+01  25.255346\n",
      "dG_PGK      4.296464e-09  56.021187\n",
      "dG_PGL     -4.961171e+01  37.233887\n",
      "...                  ...        ...\n",
      "dG_ME2     -8.241713e+01  69.129417\n",
      "dG_NADH16  -2.107758e+02 -33.736290\n",
      "dG_NADTRHD -4.100270e+01  23.163226\n",
      "dG_NH4t    -9.670293e+00  27.994102\n",
      "dG_PDH     -1.200410e+02  31.809464\n",
      "\n",
      "[72 rows x 2 columns]\n"
     ]
    }
   ],
   "source": [
    "ranges_MIQC_cplex = analysis.variability_legacy_cplex(tfa_model,variable_list = dg_vars) # TVA using MIQCP with cplex\n",
    "print(ranges_MIQC_cplex)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, if you need to use `Gurobi`, you should use `variability_legacy_gurobi` function. Optionally, if you wish to warm start the optimization, you should specify `warm_start` parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Please note, runnig TVA on genome scale model is computationally expensive, especially using MIQC problems. We have observed that relaxing the `Gap` parameter helped achieving faster run times. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 Alternative way to solve MIQC\n",
    "\n",
    "When `Gurobi` or `Cplex` is not available, and if you wanted to run MIQC problem, we developed a sampling based approach to solve the MIQCP. We sample the uncertainity variables on the surface of the ellipsoid and solve the subsequent MILP problem. This can be solved using MILP solvers. The exit criterion of the sampler can be chosen as either\n",
    "\n",
    "(1) the number of samples since last improvement or \n",
    "(2) a fixed number of samples followed by use of a generalized extreme value distribution to infer the maximum value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:400: RuntimeWarning: invalid value encountered in double_scalars\n",
      "  return m3 / np.power(m2, 1.5)\n",
      "/home/moritz/.pyenv/versions/3.6.12/envs/tmfa/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:2666: RuntimeWarning: invalid value encountered in subtract\n",
      "  -pex2+logpex2-logex2)\n"
     ]
    }
   ],
   "source": [
    "gev_ranges, gev_samples = analysis.gev_sampling(tfa_model, variable_list=dg_vars, cutoff = 100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            minimum        maximum     minimum     maximum     minimum  \\\n",
      "dG_PFK          0.0  6.950160e-310  -57.126492   18.202297  -56.355503   \n",
      "dG_PFL          0.0  6.950167e-310  -57.487150   17.841639  -56.888039   \n",
      "dG_PGI          0.0  6.950162e-310  -15.744384   21.920011  -15.992455   \n",
      "dG_PGK          0.0  6.950167e-310    0.000000   50.801699    0.000000   \n",
      "dG_PGL          0.0  6.950167e-310  -33.529149   22.967443  -33.880045   \n",
      "...             ...            ...         ...         ...         ...   \n",
      "dG_ME2          0.0  6.950236e-310  -48.030535   46.130452  -57.819277   \n",
      "dG_NADH16       0.0  6.950166e-310 -165.244331 -101.608240 -157.179347   \n",
      "dG_NADTRHD      0.0  6.950166e-310  -37.952069   16.907125  -37.511270   \n",
      "dG_NH4t         0.0  6.950236e-310   -9.670293   27.994102   -9.670293   \n",
      "dG_PDH          0.0  6.950236e-310  -87.076334   14.224151  -92.323210   \n",
      "\n",
      "              maximum     minimum    maximum     minimum    maximum  ...  \\\n",
      "dG_PFK      18.973286  -58.369161  16.959629  -57.249094  18.079695  ...   \n",
      "dG_PFL      18.440751  -57.556992  17.771798  -57.588522  17.740267  ...   \n",
      "dG_PGI      21.671940  -16.329137  21.335258  -15.801123  21.863272  ...   \n",
      "dG_PGK      50.528925    0.000000  50.929908    0.000000  51.194993  ...   \n",
      "dG_PGL      22.616547  -32.791313  23.705279  -33.802262  22.694330  ...   \n",
      "...               ...         ...        ...         ...        ...  ...   \n",
      "dG_ME2      36.341710  -44.943975  49.217011  -49.060255  45.100732  ...   \n",
      "dG_NADH16  -93.221033 -146.633223 -83.709714 -154.574690 -91.433939  ...   \n",
      "dG_NADTRHD  18.001545  -38.347183  15.548170  -38.247234  16.037089  ...   \n",
      "dG_NH4t     27.994102   -9.670293  27.994102   -9.670293  27.994102  ...   \n",
      "dG_PDH       9.299499  -79.044267  20.157648  -89.783288  11.021858  ...   \n",
      "\n",
      "               minimum    maximum     minimum    maximum     minimum  \\\n",
      "dG_PFK      -57.361601  17.967189  -57.003212  18.325577  -57.404395   \n",
      "dG_PFL      -57.450801  17.877988  -59.125999  16.202791  -57.345710   \n",
      "dG_PGI      -15.155035  22.509359  -15.955762  21.708633  -16.048769   \n",
      "dG_PGK        0.000000  50.182441    0.000000  50.874615    0.000000   \n",
      "dG_PGL      -38.775775  17.720817  -37.063765  19.432827  -35.456340   \n",
      "...                ...        ...         ...        ...         ...   \n",
      "dG_ME2      -46.149320  48.011667  -55.544344  38.616642  -57.539017   \n",
      "dG_NADH16  -147.696089 -84.412259 -149.651005 -85.956494 -161.408033   \n",
      "dG_NADTRHD  -37.990377  16.290842  -37.853004  17.440327  -37.910998   \n",
      "dG_NH4t      -9.670293  27.994102   -9.670293  27.994102   -9.670293   \n",
      "dG_PDH      -84.435829  14.996814  -91.232542  10.126363  -93.275548   \n",
      "\n",
      "              maximum     minimum    maximum     minimum    maximum  \n",
      "dG_PFK      17.924394  -57.339358  17.989432  -57.041507  18.287283  \n",
      "dG_PFL      17.983080  -54.138492  21.190297  -55.180897  20.147892  \n",
      "dG_PGI      21.615625  -16.099379  21.565016  -15.495405  22.168989  \n",
      "dG_PGK      50.131498    0.000000  51.501463    0.000000  50.647421  \n",
      "dG_PGL      21.040252  -36.490842  20.005750  -35.674083  20.822509  \n",
      "...               ...         ...        ...         ...        ...  \n",
      "dG_ME2      36.621969  -56.770878  37.390109  -57.634138  36.526849  \n",
      "dG_NADH16  -98.454626 -148.205144 -85.254659 -154.115709 -90.590310  \n",
      "dG_NADTRHD  16.392663  -37.467504  17.599419  -37.682847  17.236616  \n",
      "dG_NH4t     27.994102   -9.670293  27.994102   -9.670293  27.994102  \n",
      "dG_PDH       7.342254  -93.586076   7.028803  -94.904108   6.285686  \n",
      "\n",
      "[72 rows x 202 columns]\n"
     ]
    }
   ],
   "source": [
    "print(gev_samples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`gev_samples` is the alternate set of Gibbs energy of reactions and `gev_ranges` is the max-min ranges of Gibbs energy of reactions. By specifying `cutoff` parameter, we specify how many samples we use to train the extreme value distribution (default, 1000). The more samples the better. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you could choose to use the other exit crieteria where if we don't see improvement since a defined number of samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "cutoff_ranges, cutoff_samples, num_samples = analysis.cutoff_sampling(tfa_model, cutoff=10, variable_list=dg_vars)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, `cutoff_ranges`, `cutoff_samples`, `num_samples` represents max-min Gibbs energy ranges of reactions, alternate Gibbs energy of reactions and num of samples to achieve the exit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            minimum    maximum     minimum    maximum     minimum    maximum  \\\n",
      "dG_PFK          0.0  19.994462  -57.381509  17.947280  -58.071537  17.257252   \n",
      "dG_PFL          0.0  21.678685  -60.857036  14.471753  -58.668320  16.660470   \n",
      "dG_PGI          0.0  22.843247  -16.019409  21.644985  -15.831312  21.833083   \n",
      "dG_PGK          0.0  52.777768    0.000000  51.082344    0.000000  51.522022   \n",
      "dG_PGL          0.0  25.614162  -36.254780  20.241812  -35.437215  21.059377   \n",
      "...             ...        ...         ...        ...         ...        ...   \n",
      "dG_ME2          0.0  50.162726  -53.493241  40.667746  -46.053466  48.107521   \n",
      "dG_NADH16       0.0 -50.696450 -161.222356 -98.560476 -146.618738 -84.279283   \n",
      "dG_NADTRHD      0.0  19.027818  -38.046908  15.441167  -37.461931  16.860187   \n",
      "dG_NH4t         0.0  27.993973   -9.670293  27.994102   -9.670293  27.994102   \n",
      "dG_PDH          0.0  19.494280  -93.291169   7.035106  -81.992453  17.703287   \n",
      "\n",
      "               minimum    maximum     minimum    maximum  ...     minimum  \\\n",
      "dG_PFK      -57.041093  18.287697  -57.075243  18.253547  ...  -56.378078   \n",
      "dG_PFL      -57.517947  17.810842  -53.692094  21.636696  ...  -58.208255   \n",
      "dG_PGI      -15.754052  21.910342  -16.727598  20.936797  ...  -16.119889   \n",
      "dG_PGK        0.000000  50.589288    0.000000  51.062350  ...    0.000000   \n",
      "dG_PGL      -32.623016  23.873576  -31.511479  24.985113  ...  -31.982957   \n",
      "...                ...        ...         ...        ...  ...         ...   \n",
      "dG_ME2      -54.088576  40.072410  -54.682189  39.478797  ...  -56.093864   \n",
      "dG_NADH16  -149.744292 -85.730297 -163.037216 -99.770295  ... -159.546298   \n",
      "dG_NADTRHD  -37.532229  17.851851  -38.441989  16.392084  ...  -37.486707   \n",
      "dG_NH4t      -9.670293  27.994102   -9.670293  27.994102  ...   -9.670293   \n",
      "dG_PDH      -88.726706  12.951684  -89.362655  11.568661  ...  -94.402224   \n",
      "\n",
      "              maximum     minimum    maximum     minimum    maximum  \\\n",
      "dG_PFK      18.950711  -56.928126  18.400663  -58.604627  16.724162   \n",
      "dG_PFL      17.120534  -57.489656  17.839133  -56.151032  19.177758   \n",
      "dG_PGI      21.544506  -15.870513  21.793882  -16.476337  21.188058   \n",
      "dG_PGK      50.387229    0.000000  49.334484    0.000000  50.463590   \n",
      "dG_PGL      24.513635  -37.281422  19.215170  -36.178770  20.317822   \n",
      "...               ...         ...        ...         ...        ...   \n",
      "dG_ME2      38.067123  -49.907013  44.253974  -50.748066  43.412921   \n",
      "dG_NADH16  -95.548451 -157.436641 -93.708334 -159.538517 -95.410680   \n",
      "dG_NADTRHD  17.803702  -37.723270  17.408852  -37.521470  17.786028   \n",
      "dG_NH4t     27.994102   -9.670293  27.994102   -9.670293  27.994102   \n",
      "dG_PDH       7.260018  -85.238851  15.998135  -85.434103  16.358129   \n",
      "\n",
      "               minimum    maximum     minimum    maximum  \n",
      "dG_PFK      -57.907485  17.421305  -56.873109  18.455681  \n",
      "dG_PFL      -56.184370  19.144420  -60.974034  14.354756  \n",
      "dG_PGI      -15.347801  22.316594  -16.130546  21.533848  \n",
      "dG_PGK        0.000000  50.097605    0.000000  50.037495  \n",
      "dG_PGL      -36.129072  20.367520  -32.519994  23.976598  \n",
      "...                ...        ...         ...        ...  \n",
      "dG_ME2      -52.093254  42.067733  -51.961487  42.199500  \n",
      "dG_NADH16  -151.118261 -87.355727 -158.860935 -95.815908  \n",
      "dG_NADTRHD  -38.440229  17.719414  -37.858607  17.706442  \n",
      "dG_NH4t      -9.670293  27.994102   -9.670293  27.994102  \n",
      "dG_PDH      -88.995090  12.431838  -89.969652  10.739770  \n",
      "\n",
      "[72 rows x 24 columns]\n"
     ]
    }
   ],
   "source": [
    "print(cutoff_samples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 Network embeded thermodynamics (NET)\n",
    "\n",
    "By using the TVA apparoches demonstrated above, one can perform NET type of analysis. For example, we can fix the directionalities of all the reactions and sample for the metabolite concentrations. \n",
    "\n",
    "Reaction directionalities can be fixed by adjusting the lower or upper bounds of the reactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "lower bound is -1000.0 and upper bound is 1000.0\n",
      "\n",
      "updated lower and upper bounds are 0 \t 1000.0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "atps = tfa_model.reactions.get_by_id('ATPS4r')\n",
    "print(\"lower bound is {} and upper bound is {}\\n\".format(atps.lower_bound, atps.upper_bound))\n",
    "atps.lower_bound = 0\n",
    "print(\"updated lower and upper bounds are {} \\t {}\\n\".format(atps.lower_bound, atps.upper_bound))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can access the metabolite concentration variables as shown below,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Selected objective sense:  MINIMIZE\n",
      "Selected objective  name:  510d6206-5dba-11eb-bac2-c1c11d7a5b17\n",
      "Selected RHS        name:  rhs\n",
      "Selected bound      name:  bnd\n",
      "                minimum   maximum\n",
      "lnc_glc__D_e -11.512925 -3.912023\n",
      "lnc_gln__L_c -11.512925 -3.912023\n",
      "lnc_gln__L_e -11.512925 -3.912023\n",
      "lnc_glu__L_c -11.512925 -3.912023\n",
      "lnc_glu__L_e -11.512925 -3.912023\n",
      "...                 ...       ...\n",
      "lnc_fru_e    -11.512925 -3.912023\n",
      "lnc_fum_c    -11.512925 -3.912023\n",
      "lnc_fum_e    -11.512925 -3.912023\n",
      "lnc_g3p_c    -11.512925 -3.912023\n",
      "lnc_g6p_c    -11.512925 -3.912023\n",
      "\n",
      "[72 rows x 2 columns]\n"
     ]
    }
   ],
   "source": [
    "concentration_vars = [var.name for var in tfa_model.variables if var.name.startswith('lnc_')]\n",
    "tfa_model.solver = 'cplex'\n",
    "concentration_ranges = analysis.variability_legacy_cplex(tfa_model,variable_list = concentration_vars) # TVA using MIQCP with cplex\n",
    "print(concentration_ranges)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}