:py:mod:`cobra.sampling.optgp`
==============================

.. py:module:: cobra.sampling.optgp

.. autoapi-nested-parse::

   Provide the OptGP sampler class and helper functions.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   cobra.sampling.optgp.OptGPSampler


.. py:class:: OptGPSampler(model: cobra.Model, thinning: int = 100, processes: Optional[int] = None, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)


   Bases: :py:obj:`cobra.sampling.hr_sampler.HRSampler`

   Improved Artificial Centering Hit-and-Run sampler.

   A parallel sampler with fast convergence and parallel execution.
   See [1]_ for details.

   :param model: The cobra model from which to generate samples.
   :type model: cobra.Model
   :param processes: The number of processes used during sampling
                     (default cobra.Configuration.processes).
   :type processes: int, optional
   :param thinning: The thinning factor of the generated sampling chain. A thinning of
                    10 means samples are returned every 10 steps (default 100).
   :type thinning: int, optional
   :param nproj: How often to reproject the sampling point into the feasibility
                 space. Avoids numerical issues at the cost of lower sampling. If
                 you observe many equality constraint violations with
                 `sampler.validate` you should lower this number (default None).
   :type nproj: int > 0, optional
   :param seed: Sets the random number seed. Initialized to the current time stamp
                if None (default None).
   :type seed: int > 0, optional

   .. attribute:: n_samples

      The total number of samples that have been generated by this
      sampler instance.

      :type: int

   .. attribute:: problem

      A NamedTuple whose attributes define the entire sampling problem in
      matrix form.

      :type: typing.NamedTuple

   .. attribute:: warmup

      A numpy matrix with as many columns as reactions in the model and
      more than 3 rows containing a warmup sample in each row. None if no
      warmup points have been generated yet.

      :type: numpy.matrix

   .. attribute:: retries

      The overall of sampling retries the sampler has observed. Larger
      values indicate numerical instabilities.

      :type: int

   .. attribute:: fwd_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective forward variable.

      :type: numpy.array

   .. attribute:: rev_idx

      A numpy array having one entry for each reaction in the model,
      containing the index of the respective reverse variable.

      :type: numpy.array

   .. attribute:: prev

      The current/last flux sample generated.

      :type: numpy.array

   .. attribute:: center

      The center of the sampling space as estimated by the mean of all
      previously generated samples.

      :type: numpy.array

   .. rubric:: Notes

   The sampler is very similar to artificial centering where each process
   samples its own chain. Initial points are chosen randomly from the
   warmup points followed by a linear transformation that pulls the points
   a little bit towards the center of the sampling space.

   If the number of processes used is larger than the one requested,
   number of samples is adjusted to the smallest multiple of the number of
   processes larger than the requested sample number. For instance, if you
   have 3 processes and request 8 samples, you will receive 9.

   Memory usage is roughly in the order of (2 * number of reactions)^2
   due to the required nullspace matrices and warmup points. So, large
   models easily take up a few GBs of RAM. However, most of the large
   matrices are kept in shared memory. So the RAM usage is independent of
   the number of processes.

   .. rubric:: References

   .. [1] Megchelenbrink W, Huynen M, Marchiori E (2014)
      optGpSampler: An Improved Tool for Uniformly Sampling the Solution-Space
      of Genome-Scale Metabolic Networks.
      PLoS ONE 9(2): e86587.
      https://doi.org/10.1371/journal.pone.0086587

   .. py:method:: sample(n: int, fluxes: bool = True) -> pandas.DataFrame

      Generate a set of samples.

      This is the basic sampling function for all hit-and-run samplers.

      :param n: The minimum number of samples that are generated at once.
      :type n: int
      :param fluxes: Whether to return fluxes or the internal solver variables. If
                     set to False, will return a variable for each forward and
                     backward flux as well as all additional variables you might
                     have defined in the model (default True).
      :type fluxes: bool, optional

      :returns: Returns a pandas DataFrame with `n` rows, each containing a
                flux sample.
      :rtype: pandas.DataFrame

      .. rubric:: Notes

      Performance of this function linearly depends on the number
      of reactions in your model and the thinning factor.

      If the number of processes is larger than one, computation is split
      across the CPU cores of your machine. This may shorten computation
      time. However, there is also overhead in setting up parallel
      computation primitives so, we recommend to calculate large numbers
      of samples at once (`n` > 1000).


   .. py:method:: __getstate__() -> Dict

      Return the object for serialization.