{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Flux sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The easiest way to get started with flux sampling is using the `sample` function in the `flux_analysis` submodule. `sample` takes at least two arguments: a cobra model and the number of samples you want to generate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Scaling...\n",
      " A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00\n",
      "Problem data seem to be well scaled\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ACALD</th>\n",
       "      <th>ACALDt</th>\n",
       "      <th>ACKr</th>\n",
       "      <th>ACONTa</th>\n",
       "      <th>ACONTb</th>\n",
       "      <th>ACt2r</th>\n",
       "      <th>ADK1</th>\n",
       "      <th>AKGDH</th>\n",
       "      <th>AKGt2r</th>\n",
       "      <th>ALCD2x</th>\n",
       "      <th>...</th>\n",
       "      <th>RPI</th>\n",
       "      <th>SUCCt2_2</th>\n",
       "      <th>SUCCt3</th>\n",
       "      <th>SUCDi</th>\n",
       "      <th>SUCOAS</th>\n",
       "      <th>TALA</th>\n",
       "      <th>THD2</th>\n",
       "      <th>TKT1</th>\n",
       "      <th>TKT2</th>\n",
       "      <th>TPI</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.771296</td>\n",
       "      <td>-0.431093</td>\n",
       "      <td>-2.284246</td>\n",
       "      <td>6.735302</td>\n",
       "      <td>6.735302</td>\n",
       "      <td>-2.284246</td>\n",
       "      <td>2.273930</td>\n",
       "      <td>3.942050</td>\n",
       "      <td>-1.589360</td>\n",
       "      <td>-0.340203</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.314416</td>\n",
       "      <td>6.997002</td>\n",
       "      <td>8.082133</td>\n",
       "      <td>335.977726</td>\n",
       "      <td>-3.942050</td>\n",
       "      <td>2.152983</td>\n",
       "      <td>12.850348</td>\n",
       "      <td>2.152983</td>\n",
       "      <td>2.088065</td>\n",
       "      <td>7.542984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-2.089680</td>\n",
       "      <td>-1.099843</td>\n",
       "      <td>-0.386453</td>\n",
       "      <td>10.477790</td>\n",
       "      <td>10.477790</td>\n",
       "      <td>-0.386453</td>\n",
       "      <td>3.396770</td>\n",
       "      <td>3.163168</td>\n",
       "      <td>-1.592767</td>\n",
       "      <td>-0.989837</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.756359</td>\n",
       "      <td>3.093051</td>\n",
       "      <td>3.415053</td>\n",
       "      <td>540.804734</td>\n",
       "      <td>-3.163168</td>\n",
       "      <td>1.657479</td>\n",
       "      <td>56.649368</td>\n",
       "      <td>1.657479</td>\n",
       "      <td>1.617715</td>\n",
       "      <td>8.029587</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-1.108346</td>\n",
       "      <td>-0.126460</td>\n",
       "      <td>-1.198639</td>\n",
       "      <td>5.057819</td>\n",
       "      <td>5.057819</td>\n",
       "      <td>-1.198639</td>\n",
       "      <td>7.154043</td>\n",
       "      <td>0.313155</td>\n",
       "      <td>-0.227554</td>\n",
       "      <td>-0.981886</td>\n",
       "      <td>...</td>\n",
       "      <td>-4.491355</td>\n",
       "      <td>7.873466</td>\n",
       "      <td>8.606818</td>\n",
       "      <td>558.331088</td>\n",
       "      <td>-0.313155</td>\n",
       "      <td>4.295540</td>\n",
       "      <td>12.141283</td>\n",
       "      <td>4.295540</td>\n",
       "      <td>4.216795</td>\n",
       "      <td>5.181243</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-1.239111</td>\n",
       "      <td>-0.334024</td>\n",
       "      <td>-1.284023</td>\n",
       "      <td>12.035499</td>\n",
       "      <td>12.035499</td>\n",
       "      <td>-1.284023</td>\n",
       "      <td>19.790232</td>\n",
       "      <td>1.359155</td>\n",
       "      <td>-0.007846</td>\n",
       "      <td>-0.905088</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.063827</td>\n",
       "      <td>9.002800</td>\n",
       "      <td>10.772472</td>\n",
       "      <td>647.037371</td>\n",
       "      <td>-1.359155</td>\n",
       "      <td>2.016495</td>\n",
       "      <td>26.609381</td>\n",
       "      <td>2.016495</td>\n",
       "      <td>1.997461</td>\n",
       "      <td>7.712080</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-1.943290</td>\n",
       "      <td>-1.571257</td>\n",
       "      <td>-0.842072</td>\n",
       "      <td>11.035900</td>\n",
       "      <td>11.035900</td>\n",
       "      <td>-0.842072</td>\n",
       "      <td>15.963163</td>\n",
       "      <td>0.288986</td>\n",
       "      <td>-0.861444</td>\n",
       "      <td>-0.372033</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.468256</td>\n",
       "      <td>7.188997</td>\n",
       "      <td>8.085781</td>\n",
       "      <td>368.039875</td>\n",
       "      <td>-0.288986</td>\n",
       "      <td>1.465663</td>\n",
       "      <td>5.730886</td>\n",
       "      <td>1.465663</td>\n",
       "      <td>1.464620</td>\n",
       "      <td>8.298285</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 95 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      ACALD    ACALDt      ACKr     ACONTa     ACONTb     ACt2r       ADK1  \\\n",
       "0 -0.771296 -0.431093 -2.284246   6.735302   6.735302 -2.284246   2.273930   \n",
       "1 -2.089680 -1.099843 -0.386453  10.477790  10.477790 -0.386453   3.396770   \n",
       "2 -1.108346 -0.126460 -1.198639   5.057819   5.057819 -1.198639   7.154043   \n",
       "3 -1.239111 -0.334024 -1.284023  12.035499  12.035499 -1.284023  19.790232   \n",
       "4 -1.943290 -1.571257 -0.842072  11.035900  11.035900 -0.842072  15.963163   \n",
       "\n",
       "      AKGDH    AKGt2r    ALCD2x  ...       RPI  SUCCt2_2     SUCCt3  \\\n",
       "0  3.942050 -1.589360 -0.340203  ... -2.314416  6.997002   8.082133   \n",
       "1  3.163168 -1.592767 -0.989837  ... -1.756359  3.093051   3.415053   \n",
       "2  0.313155 -0.227554 -0.981886  ... -4.491355  7.873466   8.606818   \n",
       "3  1.359155 -0.007846 -0.905088  ... -2.063827  9.002800  10.772472   \n",
       "4  0.288986 -0.861444 -0.372033  ... -1.468256  7.188997   8.085781   \n",
       "\n",
       "        SUCDi    SUCOAS      TALA       THD2      TKT1      TKT2       TPI  \n",
       "0  335.977726 -3.942050  2.152983  12.850348  2.152983  2.088065  7.542984  \n",
       "1  540.804734 -3.163168  1.657479  56.649368  1.657479  1.617715  8.029587  \n",
       "2  558.331088 -0.313155  4.295540  12.141283  4.295540  4.216795  5.181243  \n",
       "3  647.037371 -1.359155  2.016495  26.609381  2.016495  1.997461  7.712080  \n",
       "4  368.039875 -0.288986  1.465663   5.730886  1.465663  1.464620  8.298285  \n",
       "\n",
       "[5 rows x 95 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from cobra.io import load_model\n",
    "from cobra.sampling import sample\n",
    "\n",
    "model = load_model(\"textbook\")\n",
    "s = sample(model, 100)\n",
    "s.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default sample uses the `optgp` method based on the [method presented here](http://dx.doi.org/10.1371/journal.pone.0086587) as it is suited for larger models and can run in parallel. By default the sampler uses a single process. This can be changed by using the `processes` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "One process:\n",
      "CPU times: user 9.8 s, sys: 333 ms, total: 10.1 s\n",
      "Wall time: 9.21 s\n",
      "Two processes:\n",
      "CPU times: user 186 ms, sys: 41.2 ms, total: 227 ms\n",
      "Wall time: 5.26 s\n"
     ]
    }
   ],
   "source": [
    "print(\"One process:\")\n",
    "%time s = sample(model, 1000)\n",
    "print(\"Two processes:\")\n",
    "%time s = sample(model, 1000, processes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively you can also user Artificial Centering Hit-and-Run for sampling by setting the method to `achr`.  `achr` does not support parallel execution but has good convergence and is almost Markovian."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = sample(model, 100, method=\"achr\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general setting up the sampler is expensive since initial search directions are generated by solving many linear programming problems. Thus, we recommend to generate as many samples as possible in one go. However, this might require finer control over the sampling procedure as described in the following section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampler objects"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The sampling process can be controlled on a lower level by using the sampler classes directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cobra.sampling import OptGPSampler, ACHRSampler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both sampler classes have standardized interfaces and take some additional argument. For instance the `thinning` factor. \"Thinning\" means only recording samples every n iterations. A higher thinning factors mean less correlated samples but also larger computation times. By default the samplers use a thinning factor of 100 which creates roughly uncorrelated samples. If you want less samples but better mixing feel free to increase this parameter. If you want to study convergence for your own model you might want to set it to 1 to obtain all iterates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "achr = ACHRSampler(model, thinning=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`OptGPSampler` has an additional `processes` argument specifying how many processes are used to create parallel sampling chains. This should be in the order of your CPU cores for maximum efficiency. As noted before class initialization can take up to a few minutes due to generation of initial search directions. Sampling on the other hand is quick."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "optgp = OptGPSampler(model, processes=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampling and validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both samplers have a sample function that generates samples from the initialized object and act like the `sample` function described above, only that this time it will only accept a single argument, the number of samples. For `OptGPSampler` the number of samples should be a multiple of the number of processes, otherwise it will be increased to the nearest multiple automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "s1 = achr.sample(100)\n",
    "\n",
    "s2 = optgp.sample(100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can call `sample` repeatedly and both samplers are optimized to generate large amount of samples without falling into \"numerical traps\". All sampler objects have a `validate` function in order to check if a set of points are feasible and give detailed information about feasibility violations in a form of a short code denoting feasibility. Here the short code is a combination of any of the following letters:\n",
    "\n",
    "- \"v\" - valid point\n",
    "- \"l\" - lower bound violation\n",
    "- \"u\" - upper bound violation\n",
    "- \"e\" - equality violation (meaning the point is not a steady state)\n",
    "\n",
    "For instance for a random flux distribution (should not be feasible):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['le'], dtype='<U3')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "bad = np.random.uniform(-1000, 1000, size=len(model.reactions))\n",
    "achr.validate(np.atleast_2d(bad))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And for our generated samples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v',\n",
       "       'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v', 'v'], dtype='<U3')"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "achr.validate(s1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even though most models are numerically stable enought that the sampler should only generate valid samples we still urge to check this. `validate` is pretty fast and works quickly even for large models and many samples. If you find invalid samples you do not necessarily have to rerun the entire sampling but can exclude them from the sample DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1_valid = s1[achr.validate(s1) == \"v\"]\n",
    "len(s1_valid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Batch sampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sampler objects are made for generating billions of samples, however using the `sample` function might quickly fill up your RAM when working with genome-scale models. Here, the `batch` method of the sampler objects might come in handy. `batch` takes two arguments, the number of samples in each batch and the number of batches. This will make sense with a small example. \n",
    "\n",
    "Let's assume we want to quantify what proportion of our samples will grow. For that we might want to generate 10 batches of 50 samples each and measure what percentage of the individual 100 samples show a growth rate larger than 0.1. Finally, we want to calculate the mean and standard deviation of those individual percentages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Usually 6.30% +- 1.49% grow...\n"
     ]
    }
   ],
   "source": [
    "counts = [np.mean(s.Biomass_Ecoli_core > 0.1) for s in optgp.batch(100, 10)]\n",
    "print(\"Usually {:.2f}% +- {:.2f}% grow...\".format(\n",
    "    np.mean(counts) * 100.0, np.std(counts) * 100.0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding constraints"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Flux sampling will respect additional contraints defined in the model. For instance we can add a constraint enforcing growth in asimilar manner as the section before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "co = model.problem.Constraint(model.reactions.Biomass_Ecoli_core.flux_expression, lb=0.1)\n",
    "model.add_cons_vars([co])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note that this is only for demonstration purposes. usually you could set the lower bound of the reaction directly instead of creating a new constraint.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    0.106248\n",
      "1    0.116061\n",
      "2    0.113978\n",
      "3    0.179056\n",
      "4    0.117057\n",
      "5    0.111005\n",
      "6    0.182250\n",
      "7    0.114853\n",
      "8    0.128597\n",
      "9    0.160970\n",
      "Name: Biomass_Ecoli_core, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "s = sample(model, 10)\n",
    "print(s.Biomass_Ecoli_core)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see our new constraint was respected."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}