:py:mod:`cobra.sampling`
========================

.. py:module:: cobra.sampling


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   achr/index.rst
   core/index.rst
   hr_sampler/index.rst
   optgp/index.rst
   sampling/index.rst


Package Contents
----------------

Classes
~~~~~~~

.. autoapisummary::

   cobra.sampling.HRSampler
   cobra.sampling.ACHRSampler
   cobra.sampling.OptGPSampler


Functions
~~~~~~~~~

.. autoapisummary::

   cobra.sampling.shared_np_array
   cobra.sampling.step
   cobra.sampling.sample


.. py:class:: HRSampler(model: cobra.Model, thinning: int, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)


   Bases: :py:obj:`abc.ABC`

   The abstract base class for hit-and-run samplers.

   New samplers should derive from this class where possible to provide
   a uniform interface.

   :param model: The cobra model from which to generate samples.
   :type model: cobra.Model
   :param thinning: The thinning factor of the generated sampling chain. A thinning of
                    10 means samples are returned every 10 steps.
   :type thinning: int
   :param nproj: How often to reproject the sampling point into the feasibility
                 space. Avoids numerical issues at the cost of lower sampling. If
                 you observe many equality constraint violations with
                 `sampler.validate` you should lower this number (default None).
   :type nproj: int > 0, optional
   :param seed: Sets the random number seed. Initialized to the current time stamp
                if None (default None).
   :type seed: int > 0, optional

   .. attribute:: feasibility_tol

      The tolerance used for checking equalities feasibility.

      :type: float

   .. attribute:: bounds_tol

      The tolerance used for checking bounds feasibility.

      :type: float

   .. attribute:: n_samples

      The total number of samples that have been generated by this
      sampler instance.

      :type: int

   .. attribute:: retries

      The overall of sampling retries the sampler has observed. Larger
      values indicate numerical instabilities.

      :type: int

   .. attribute:: problem

      A NamedTuple whose attributes define the entire sampling problem in
      matrix form.

      :type: Problem

   .. attribute:: warmup

      A numpy matrix with as many columns as reactions in the model and
      more than 3 rows containing a warmup sample in each row. None if no
      warmup points have been generated yet.

      :type: numpy.matrix

   .. attribute:: fwd_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective forward variable.

      :type: numpy.array

   .. attribute:: rev_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective reverse variable.

      :type: numpy.array

   .. py:method:: __build_problem() -> Problem

      Build the matrix representation of the sampling problem.

      :returns: The matrix representation in the form of a NamedTuple.
      :rtype: Problem


   .. py:method:: generate_fva_warmup() -> None

      Generate the warmup points for the sampler.

      Generates warmup points by setting each flux as the sole objective
      and minimizing/maximizing it. Also caches the projection of the
      warmup points into the nullspace for non-homogeneous problems (only
      if necessary).

      :raises ValueError: If flux cone contains a single point or the problem is
          inhomogeneous.


   .. py:method:: _reproject(p: numpy.ndarray) -> numpy.ndarray

      Reproject a point into the feasibility region.

      This function is guaranteed to return a new feasible point. However,
      no guarantee can be made in terms of proximity to the original
      point.

      :param p: The current sample point.
      :type p: numpy.array

      :returns: A new feasible point. If `p` is feasible, it will return `p`.
      :rtype: numpy.array


   .. py:method:: _random_point() -> numpy.ndarray

      Find an approximately random point in the flux cone.


   .. py:method:: _is_redundant(matrix: numpy.matrix, cutoff: Optional[float] = None) -> bool

      Identify redundant rows in a matrix that can be removed.


   .. py:method:: _bounds_dist(p: numpy.ndarray) -> numpy.ndarray

      Get the lower and upper bound distances. Negative is bad.


   .. py:method:: sample(n: int, fluxes: bool = True) -> pandas.DataFrame
      :abstractmethod:

      Abstract sampling function.

      Should be overwritten by child classes.

      :param n: The number of samples that are generated at once.
      :type n: int
      :param fluxes: Whether to return fluxes or the internal solver variables. If
                     set to False, will return a variable for each forward and
                     backward flux as well as all additional variables you might
                     have defined in the model (default True).
      :type fluxes: bool, optional

      :returns: Returns a pandas DataFrame with `n` rows, each containing a
                flux sample.
      :rtype: pandas.DataFrame


   .. py:method:: batch(batch_size: int, batch_num: int, fluxes: bool = True) -> pandas.DataFrame

      Create a batch generator.

      This is useful to generate `batch_num` batches of `batch_size`
      samples each.

      :param batch_size: The number of samples contained in each batch.
      :type batch_size: int
      :param batch_num: The number of batches in the generator.
      :type batch_num: int
      :param fluxes: Whether to return fluxes or the internal solver variables. If
                     set to False, will return a variable for each forward and
                     backward flux as well as all additional variables you might
                     have defined in the model (default True).
      :type fluxes: bool, optional

      :Yields: *pandas.DataFrame* -- A DataFrame with dimensions (batch_size x n_r) containing
               a valid flux sample for a total of n_r reactions (or variables
               if fluxes=False) in each row.


   .. py:method:: validate(samples: numpy.matrix) -> numpy.ndarray

      Validate a set of samples for equality and inequality feasibility.

      Can be used to check whether the generated samples and warmup points
      are feasible.

      :param samples: Must be of dimension (samples x n_reactions). Contains the
                      samples to be validated. Samples must be from fluxes.
      :type samples: numpy.matrix

      :returns: A one-dimensional numpy array containing
                a code of 1 to 3 letters denoting the validation result:
                - 'v' means feasible in bounds and equality constraints
                - 'l' means a lower bound violation
                - 'u' means a lower bound validation
                - 'e' means and equality constraint violation
      :rtype: numpy.array

      :raises ValueError: If wrong number of columns.


.. py:function:: shared_np_array(shape: Tuple[int, int], data: Optional[numpy.ndarray] = None, integer: bool = False) -> numpy.ndarray

   Create a new numpy array that resides in shared memory.

   :param shape: The shape of the new array.
   :type shape: tuple of int
   :param data: Data to copy to the new array. Has to have the same shape
                (default None).
   :type data: numpy.array, optional
   :param integer: Whether to use an integer array. By default, float array is used
                   (default False).
   :type integer: bool, optional

   :returns: The newly created shared numpy array.
   :rtype: numpy.array

   :raises ValueError: If the input `data` (if provided) size is not equal to the created
       array.


.. py:class:: ACHRSampler(model: cobra.Model, thinning: int = 100, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)


   Bases: :py:obj:`cobra.sampling.hr_sampler.HRSampler`

   Artificial Centering Hit-and-Run sampler.

   A sampler with low memory footprint and good convergence.

   :param model: The cobra model from which to generate samples.
   :type model: cobra.Model
   :param thinning: The thinning factor of the generated sampling chain. A thinning of
                    10 means samples are returned every 10 steps (default 100).
   :type thinning: int, optional
   :param nproj: How often to reproject the sampling point into the feasibility
                 space. Avoids numerical issues at the cost of lower sampling. If
                 you observe many equality constraint violations with
                 `sampler.validate` you should lower this number (default None).
   :type nproj: int > 0, optional
   :param seed: Sets the random number seed. Initialized to the current time stamp
                if None (default None).
   :type seed: int > 0, optional

   .. attribute:: n_samples

      The total number of samples that have been generated by this
      sampler instance.

      :type: int

   .. attribute:: problem

      A NamedTuple whose attributes define the entire sampling problem in
      matrix form.

      :type: typing.NamedTuple

   .. attribute:: warmup

      A numpy matrix with as many columns as reactions in the model and
      more than 3 rows containing a warmup sample in each row. None if no
      warmup points have been generated yet.

      :type: numpy.matrix

   .. attribute:: retries

      The overall of sampling retries the sampler has observed. Larger
      values indicate numerical instabilities.

      :type: int

   .. attribute:: fwd_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective forward variable.

      :type: numpy.array

   .. attribute:: rev_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective reverse variable.

      :type: numpy.array

   .. attribute:: prev

      The current/last flux sample generated.

      :type: numpy.array

   .. attribute:: center

      The center of the sampling space as estimated by the mean of all
      previously generated samples.

      :type: numpy.array

   .. rubric:: Notes

   ACHR generates samples by choosing new directions from the sampling
   space's center and the warmup points. The implementation used here is
   the same as in the MATLAB COBRA Toolbox [2]_ and uses only the initial
   warmup points to generate new directions and not any other previous
   iterations. This usually gives better mixing, since the startup points
   are chosen to span the space in a wide manner. This also makes the
   generated sampling chain quasi-Markovian since the center converges
   rapidly.

   Memory usage is roughly in the order of (2 * number of reactions) ^ 2
   due to the required nullspace matrices and warmup points. So, large
   models easily take up a few GBs of RAM.

   .. rubric:: References

   .. [1] Direction Choice for Accelerated Convergence in Hit-and-Run Sampling
      David E. Kaufman, Robert L. Smith
      Operations Research 199846:1 , 84-95
      https://doi.org/10.1287/opre.46.1.84

   .. [2] https://github.com/opencobra/cobratoolbox

   .. py:method:: __single_iteration() -> None

      Run a single iteration of the sampling.


   .. py:method:: sample(n: int, fluxes: bool = True) -> pandas.DataFrame

      Generate a set of samples.

      This is the basic sampling function for all hit-and-run samplers.

      :param n: The number of samples that are generated at once.
      :type n: int
      :param fluxes: Whether to return fluxes or the internal solver variables. If
                     set to False, will return a variable for each forward and
                     backward flux as well as all additional variables you might
                     have defined in the model (default True).
      :type fluxes: bool, optional

      :returns: Returns a pandas DataFrame with `n` rows, each containing a
                flux sample.
      :rtype: pandas.DataFrame

      .. rubric:: Notes

      Performance of this function linearly depends on the number
      of reactions in your model and the thinning factor.


.. py:function:: step(sampler: cobra.sampling.hr_sampler.HRSampler, x: numpy.ndarray, delta: numpy.ndarray, fraction: Optional[float] = None, tries: int = 0) -> numpy.ndarray

   Sample a new feasible point from the point `x` in direction `delta`.

   This is the low-level sampling stepper for samplers derived
   from `HRSampler`. Currently, it's used by `ACHRSampler` and
   `OptGPSampler`.

   It's declared outside of the base sampling class to facilitate use of
   multiprocessing.

   :param sampler: The sampler to sample a step for.
   :type sampler: cobra.sampling.HRSampler
   :param x: A point in the sampling region.
   :type x: np.array
   :param delta: The direction to take the step in.
   :type delta: np.array
   :param fraction: A float controlling the part of alpha difference to contribute to
                    the fraction of `delta` (default None). If None, alpha is obtained
                    from a normal distribution.
   :type fraction: float, optional
   :param tries: Total number of tries (default 0).
   :type tries: int, optional

   :returns: The new numpy array obtained after a step of sampling.
   :rtype: np.array

   :raises RuntimeError: If `tries` exceeds `MAX_TRIES`.


.. py:class:: OptGPSampler(model: cobra.Model, thinning: int = 100, processes: Optional[int] = None, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)


   Bases: :py:obj:`cobra.sampling.hr_sampler.HRSampler`

   Improved Artificial Centering Hit-and-Run sampler.

   A parallel sampler with fast convergence and parallel execution.
   See [1]_ for details.

   :param model: The cobra model from which to generate samples.
   :type model: cobra.Model
   :param processes: The number of processes used during sampling
                     (default cobra.Configuration.processes).
   :type processes: int, optional
   :param thinning: The thinning factor of the generated sampling chain. A thinning of
                    10 means samples are returned every 10 steps (default 100).
   :type thinning: int, optional
   :param nproj: How often to reproject the sampling point into the feasibility
                 space. Avoids numerical issues at the cost of lower sampling. If
                 you observe many equality constraint violations with
                 `sampler.validate` you should lower this number (default None).
   :type nproj: int > 0, optional
   :param seed: Sets the random number seed. Initialized to the current time stamp
                if None (default None).
   :type seed: int > 0, optional

   .. attribute:: n_samples

      The total number of samples that have been generated by this
      sampler instance.

      :type: int

   .. attribute:: problem

      A NamedTuple whose attributes define the entire sampling problem in
      matrix form.

      :type: typing.NamedTuple

   .. attribute:: warmup

      A numpy matrix with as many columns as reactions in the model and
      more than 3 rows containing a warmup sample in each row. None if no
      warmup points have been generated yet.

      :type: numpy.matrix

   .. attribute:: retries

      The overall of sampling retries the sampler has observed. Larger
      values indicate numerical instabilities.

      :type: int

   .. attribute:: fwd_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective forward variable.

      :type: numpy.array

   .. attribute:: rev_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective reverse variable.

      :type: numpy.array

   .. attribute:: prev

      The current/last flux sample generated.

      :type: numpy.array

   .. attribute:: center

      The center of the sampling space as estimated by the mean of all
      previously generated samples.

      :type: numpy.array

   .. rubric:: Notes

   The sampler is very similar to artificial centering where each process
   samples its own chain. Initial points are chosen randomly from the
   warmup points followed by a linear transformation that pulls the points
   a little bit towards the center of the sampling space.

   If the number of processes used is larger than the one requested,
   number of samples is adjusted to the smallest multiple of the number of
   processes larger than the requested sample number. For instance, if you
   have 3 processes and request 8 samples, you will receive 9.

   Memory usage is roughly in the order of (2 * number of reactions)^2
   due to the required nullspace matrices and warmup points. So, large
   models easily take up a few GBs of RAM. However, most of the large
   matrices are kept in shared memory. So the RAM usage is independent of
   the number of processes.

   .. rubric:: References

   .. [1] Megchelenbrink W, Huynen M, Marchiori E (2014)
      optGpSampler: An Improved Tool for Uniformly Sampling the Solution-Space
      of Genome-Scale Metabolic Networks.
      PLoS ONE 9(2): e86587.
      https://doi.org/10.1371/journal.pone.0086587

   .. py:method:: sample(n: int, fluxes: bool = True) -> pandas.DataFrame

      Generate a set of samples.

      This is the basic sampling function for all hit-and-run samplers.

      :param n: The minimum number of samples that are generated at once.
      :type n: int
      :param fluxes: Whether to return fluxes or the internal solver variables. If
                     set to False, will return a variable for each forward and
                     backward flux as well as all additional variables you might
                     have defined in the model (default True).
      :type fluxes: bool, optional

      :returns: Returns a pandas DataFrame with `n` rows, each containing a
                flux sample.
      :rtype: pandas.DataFrame

      .. rubric:: Notes

      Performance of this function linearly depends on the number
      of reactions in your model and the thinning factor.

      If the number of processes is larger than one, computation is split
      across the CPU cores of your machine. This may shorten computation
      time. However, there is also overhead in setting up parallel
      computation primitives so, we recommend to calculate large numbers
      of samples at once (`n` > 1000).


   .. py:method:: __getstate__() -> Dict

      Return the object for serialization.


.. py:function:: sample(model: cobra.Model, n: int, method: str = 'optgp', thinning: int = 100, processes: int = 1, seed: Optional[int] = None) -> pandas.DataFrame

   Sample valid flux distributions from a cobra model.

   Currently, two methods are supported:

   1. 'optgp' (default) which uses the OptGPSampler that supports parallel
       sampling. Requires large numbers of samples to be performant
       (`n` > 1000). For smaller samples, 'achr' might be better suited.
       For details, refer [1]_ .

   2. 'achr' which uses artificial centering hit-and-run. This is a single
      process method with good convergence. For details, refer [2]_ .

   :param model: The model from which to sample flux distributions.
   :type model: cobra.Model
   :param n: The number of samples to obtain. When using 'optgp', this must be a
             multiple of `processes`, otherwise a larger number of samples will
             be returned.
   :type n: int
   :param method: The sampling algorithm to use (default "optgp").
   :type method: {"optgp", "achr"}, optional
   :param thinning: The thinning factor of the generated sampling chain. A thinning of
                    10 means samples are returned every 10 steps. Defaults to 100 which
                    in benchmarks gives approximately uncorrelated samples. If set to 1
                    will return all iterates (default 100).
   :type thinning: int, optional
   :param processes: Only used for 'optgp'. The number of processes used to generate
                     samples (default 1).
   :type processes: int, optional
   :param seed: Sets the random number seed. Initialized to the current time stamp
                if None (default None).
   :type seed: int > 0, optional

   :returns: The generated flux samples. Each row corresponds to a sample of the
             fluxes and the columns are the reactions.
   :rtype: pandas.DataFrame

   .. rubric:: Notes

   The samplers have a correction method to ensure equality feasibility for
   long-running chains, however this will only work for homogeneous models,
   meaning models with no non-zero fixed variables or constraints (
   right-hand side of the equalities are zero).

   .. rubric:: References

   .. [1] Megchelenbrink W, Huynen M, Marchiori E (2014)
      optGpSampler: An Improved Tool for Uniformly Sampling the Solution-Space
      of Genome-Scale Metabolic Networks.
      PLoS ONE 9(2): e86587.
      https://doi.org/10.1371/journal.pone.0086587

   .. [2] Direction Choice for Accelerated Convergence in Hit-and-Run Sampling
      David E. Kaufman, Robert L. Smith
      Operations Research 199846:1 , 84-95
      https://doi.org/10.1287/opre.46.1.84