17.1.1.6.1.3. cobra.sampling.optgp
¶
Provide OptGP sampler.
17.1.1.6.1.3.1. Module Contents¶
17.1.1.6.1.3.1.1. Classes¶
A parallel optimized sampler. |
-
class
cobra.sampling.optgp.
OptGPSampler
(model, processes=None, thinning=100, nproj=None, seed=None)[source]¶ Bases:
cobra.sampling.hr_sampler.HRSampler
A parallel optimized sampler.
A parallel sampler with fast convergence and parallel execution. See 1 for details.
- Parameters
model (cobra.Model) – The cobra model from which to generate samples.
processes (int, optional (default Configuration.processes)) – The number of processes used during sampling.
thinning (int, optional) – The thinning factor of the generated sampling chain. A thinning of 10 means samples are returned every 10 steps.
nproj (int > 0, optional) – How often to reproject the sampling point into the feasibility space. Avoids numerical issues at the cost of lower sampling. If you observe many equality constraint violations with sampler.validate you should lower this number.
seed (int > 0, optional) – Sets the random number seed. Initialized to the current time stamp if None.
-
model
¶ The cobra model from which the samples get generated.
- Type
-
problem
¶ A python object whose attributes define the entire sampling problem in matrix form. See docstring of Problem.
- Type
collections.namedtuple
-
warmup
¶ A matrix of with as many columns as reactions in the model and more than 3 rows containing a warmup sample in each row. None if no warmup points have been generated yet.
- Type
-
retries
¶ The overall of sampling retries the sampler has observed. Larger values indicate numerical instabilities.
- Type
-
seed
¶ Sets the random number seed. Initialized to the current time stamp if None.
- Type
int > 0, optional
-
fwd_idx
¶ Has one entry for each reaction in the model containing the index of the respective forward variable.
- Type
numpy.array
-
rev_idx
¶ Has one entry for each reaction in the model containing the index of the respective reverse variable.
- Type
numpy.array
-
prev
¶ The current/last flux sample generated.
- Type
numpy.array
-
center
¶ The center of the sampling space as estimated by the mean of all previously generated samples.
- Type
numpy.array
Notes
The sampler is very similar to artificial centering where each process samples its own chain. Initial points are chosen randomly from the warmup points followed by a linear transformation that pulls the points a little bit towards the center of the sampling space.
If the number of processes used is larger than the one requested, number of samples is adjusted to the smallest multiple of the number of processes larger than the requested sample number. For instance, if you have 3 processes and request 8 samples you will receive 9.
Memory usage is roughly in the order of (2 * number reactions)^2 due to the required nullspace matrices and warmup points. So large models easily take up a few GB of RAM. However, most of the large matrices are kept in shared memory. So the RAM usage is independent of the number of processes.
References
- 1
Megchelenbrink W, Huynen M, Marchiori E (2014) optGpSampler: An Improved Tool for Uniformly Sampling the Solution-Space of Genome-Scale Metabolic Networks. PLoS ONE 9(2): e86587. https://doi.org/10.1371/journal.pone.0086587
-
sample
(self, n, fluxes=True)[source]¶ Generate a set of samples.
This is the basic sampling function for all hit-and-run samplers.
- Parameters
n (int) – The minimum number of samples that are generated at once (see Notes).
fluxes (boolean) – Whether to return fluxes or the internal solver variables. If set to False will return a variable for each forward and backward flux as well as all additional variables you might have defined in the model.
- Returns
Returns a matrix with n rows, each containing a flux sample.
- Return type
Notes
Performance of this function linearly depends on the number of reactions in your model and the thinning factor.
If the number of processes is larger than one, computation is split across as the CPUs of your machine. This may shorten computation time. However, there is also overhead in setting up parallel computation so we recommend to calculate large numbers of samples at once (n > 1000).