17.1.7. cobra.sampling
¶
Submodules¶
Package Contents¶
Classes¶
The abstract base class for hitandrun samplers. 

Artificial Centering HitandRun sampler. 

Improved Artificial Centering HitandRun sampler. 
Functions¶

Create a new numpy array that resides in shared memory. 

Sample a new feasible point from the point x in direction delta. 

Sample valid flux distributions from a cobra model. 
 class cobra.sampling.HRSampler(model: cobra.Model, thinning: int, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)[source]¶
Bases:
abc.ABC
The abstract base class for hitandrun samplers.
New samplers should derive from this class where possible to provide a uniform interface.
 Parameters
model (cobra.Model) – The cobra model from which to generate samples.
thinning (int) – The thinning factor of the generated sampling chain. A thinning of 10 means samples are returned every 10 steps.
nproj (int > 0, optional) – How often to reproject the sampling point into the feasibility space. Avoids numerical issues at the cost of lower sampling. If you observe many equality constraint violations with sampler.validate you should lower this number (default None).
seed (int > 0, optional) – Sets the random number seed. Initialized to the current time stamp if None (default None).
 retries¶
The overall of sampling retries the sampler has observed. Larger values indicate numerical instabilities.
 Type
 problem¶
A NamedTuple whose attributes define the entire sampling problem in matrix form.
 Type
Problem
 warmup¶
A numpy matrix with as many columns as reactions in the model and more than 3 rows containing a warmup sample in each row. None if no warmup points have been generated yet.
 Type
 fwd_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective forward variable.
 Type
numpy.array
 rev_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective reverse variable.
 Type
numpy.array
 __build_problem() Problem ¶
Build the matrix representation of the sampling problem.
 Returns
The matrix representation in the form of a NamedTuple.
 Return type
Problem
 generate_fva_warmup() None ¶
Generate the warmup points for the sampler.
Generates warmup points by setting each flux as the sole objective and minimizing/maximizing it. Also caches the projection of the warmup points into the nullspace for nonhomogeneous problems (only if necessary).
 Raises
ValueError – If flux cone contains a single point or the problem is inhomogeneous.
 _reproject(p: numpy.ndarray) numpy.ndarray ¶
Reproject a point into the feasibility region.
This function is guaranteed to return a new feasible point. However, no guarantee can be made in terms of proximity to the original point.
 Parameters
p (numpy.array) – The current sample point.
 Returns
A new feasible point. If p is feasible, it will return p.
 Return type
numpy.array
 _random_point() numpy.ndarray ¶
Find an approximately random point in the flux cone.
 _is_redundant(matrix: numpy.matrix, cutoff: Optional[float] = None) bool ¶
Identify redundant rows in a matrix that can be removed.
 _bounds_dist(p: numpy.ndarray) numpy.ndarray ¶
Get the lower and upper bound distances. Negative is bad.
 abstract sample(n: int, fluxes: bool = True) pandas.DataFrame ¶
Abstract sampling function.
Should be overwritten by child classes.
 Parameters
n (int) – The number of samples that are generated at once.
fluxes (bool, optional) – Whether to return fluxes or the internal solver variables. If set to False, will return a variable for each forward and backward flux as well as all additional variables you might have defined in the model (default True).
 Returns
Returns a pandas DataFrame with n rows, each containing a flux sample.
 Return type
pandas.DataFrame
 batch(batch_size: int, batch_num: int, fluxes: bool = True) pandas.DataFrame ¶
Create a batch generator.
This is useful to generate batch_num batches of batch_size samples each.
 Parameters
batch_size (int) – The number of samples contained in each batch.
batch_num (int) – The number of batches in the generator.
fluxes (bool, optional) – Whether to return fluxes or the internal solver variables. If set to False, will return a variable for each forward and backward flux as well as all additional variables you might have defined in the model (default True).
 Yields
pandas.DataFrame – A DataFrame with dimensions (batch_size x n_r) containing a valid flux sample for a total of n_r reactions (or variables if fluxes=False) in each row.
 validate(samples: numpy.matrix) numpy.ndarray ¶
Validate a set of samples for equality and inequality feasibility.
Can be used to check whether the generated samples and warmup points are feasible.
 Parameters
samples (numpy.matrix) – Must be of dimension (samples x n_reactions). Contains the samples to be validated. Samples must be from fluxes.
 Returns
A onedimensional numpy array containing a code of 1 to 3 letters denoting the validation result:  ‘v’ means feasible in bounds and equality constraints  ‘l’ means a lower bound violation  ‘u’ means a lower bound validation  ‘e’ means and equality constraint violation
 Return type
numpy.array
 Raises
ValueError – If wrong number of columns.
Create a new numpy array that resides in shared memory.
 Parameters
 Returns
The newly created shared numpy array.
 Return type
numpy.array
 Raises
ValueError – If the input data (if provided) size is not equal to the created array.
 class cobra.sampling.ACHRSampler(model: cobra.Model, thinning: int = 100, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)[source]¶
Bases:
cobra.sampling.hr_sampler.HRSampler
Artificial Centering HitandRun sampler.
A sampler with low memory footprint and good convergence.
 Parameters
model (cobra.Model) – The cobra model from which to generate samples.
thinning (int, optional) – The thinning factor of the generated sampling chain. A thinning of 10 means samples are returned every 10 steps (default 100).
nproj (int > 0, optional) – How often to reproject the sampling point into the feasibility space. Avoids numerical issues at the cost of lower sampling. If you observe many equality constraint violations with sampler.validate you should lower this number (default None).
seed (int > 0, optional) – Sets the random number seed. Initialized to the current time stamp if None (default None).
 problem¶
A NamedTuple whose attributes define the entire sampling problem in matrix form.
 Type
 warmup¶
A numpy matrix with as many columns as reactions in the model and more than 3 rows containing a warmup sample in each row. None if no warmup points have been generated yet.
 Type
 retries¶
The overall of sampling retries the sampler has observed. Larger values indicate numerical instabilities.
 Type
 fwd_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective forward variable.
 Type
numpy.array
 rev_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective reverse variable.
 Type
numpy.array
 prev¶
The current/last flux sample generated.
 Type
numpy.array
 center¶
The center of the sampling space as estimated by the mean of all previously generated samples.
 Type
numpy.array
Notes
ACHR generates samples by choosing new directions from the sampling space’s center and the warmup points. The implementation used here is the same as in the MATLAB COBRA Toolbox [2]_ and uses only the initial warmup points to generate new directions and not any other previous iterations. This usually gives better mixing, since the startup points are chosen to span the space in a wide manner. This also makes the generated sampling chain quasiMarkovian since the center converges rapidly.
Memory usage is roughly in the order of (2 * number of reactions) ^ 2 due to the required nullspace matrices and warmup points. So, large models easily take up a few GBs of RAM.
References
 1
Direction Choice for Accelerated Convergence in HitandRun Sampling David E. Kaufman, Robert L. Smith Operations Research 199846:1 , 8495 https://doi.org/10.1287/opre.46.1.84
 2
 sample(n: int, fluxes: bool = True) pandas.DataFrame ¶
Generate a set of samples.
This is the basic sampling function for all hitandrun samplers.
 Parameters
n (int) – The number of samples that are generated at once.
fluxes (bool, optional) – Whether to return fluxes or the internal solver variables. If set to False, will return a variable for each forward and backward flux as well as all additional variables you might have defined in the model (default True).
 Returns
Returns a pandas DataFrame with n rows, each containing a flux sample.
 Return type
pandas.DataFrame
Notes
Performance of this function linearly depends on the number of reactions in your model and the thinning factor.
 cobra.sampling.step(sampler: cobra.sampling.hr_sampler.HRSampler, x: numpy.ndarray, delta: numpy.ndarray, fraction: Optional[float] = None, tries: int = 0) numpy.ndarray [source]¶
Sample a new feasible point from the point x in direction delta.
This is the lowlevel sampling stepper for samplers derived from HRSampler. Currently, it’s used by ACHRSampler and OptGPSampler.
It’s declared outside of the base sampling class to facilitate use of multiprocessing.
 Parameters
sampler (cobra.sampling.HRSampler) – The sampler to sample a step for.
x (np.array) – A point in the sampling region.
delta (np.array) – The direction to take the step in.
fraction (float, optional) – A float controlling the part of alpha difference to contribute to the fraction of delta (default None). If None, alpha is obtained from a normal distribution.
tries (int, optional) – Total number of tries (default 0).
 Returns
The new numpy array obtained after a step of sampling.
 Return type
np.array
 Raises
RuntimeError – If tries exceeds MAX_TRIES.
 class cobra.sampling.OptGPSampler(model: cobra.Model, thinning: int = 100, processes: Optional[int] = None, nproj: Optional[int] = None, seed: Optional[int] = None, **kwargs)[source]¶
Bases:
cobra.sampling.hr_sampler.HRSampler
Improved Artificial Centering HitandRun sampler.
A parallel sampler with fast convergence and parallel execution. See [1]_ for details.
 Parameters
model (cobra.Model) – The cobra model from which to generate samples.
processes (int, optional) – The number of processes used during sampling (default cobra.Configuration.processes).
thinning (int, optional) – The thinning factor of the generated sampling chain. A thinning of 10 means samples are returned every 10 steps (default 100).
nproj (int > 0, optional) – How often to reproject the sampling point into the feasibility space. Avoids numerical issues at the cost of lower sampling. If you observe many equality constraint violations with sampler.validate you should lower this number (default None).
seed (int > 0, optional) – Sets the random number seed. Initialized to the current time stamp if None (default None).
 problem¶
A NamedTuple whose attributes define the entire sampling problem in matrix form.
 Type
 warmup¶
A numpy matrix with as many columns as reactions in the model and more than 3 rows containing a warmup sample in each row. None if no warmup points have been generated yet.
 Type
 retries¶
The overall of sampling retries the sampler has observed. Larger values indicate numerical instabilities.
 Type
 fwd_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective forward variable.
 Type
numpy.array
 rev_idx¶
A numpy array having one entry for each reaction in the model, containing the index of the respective reverse variable.
 Type
numpy.array
 prev¶
The current/last flux sample generated.
 Type
numpy.array
 center¶
The center of the sampling space as estimated by the mean of all previously generated samples.
 Type
numpy.array
Notes
The sampler is very similar to artificial centering where each process samples its own chain. Initial points are chosen randomly from the warmup points followed by a linear transformation that pulls the points a little bit towards the center of the sampling space.
If the number of processes used is larger than the one requested, number of samples is adjusted to the smallest multiple of the number of processes larger than the requested sample number. For instance, if you have 3 processes and request 8 samples, you will receive 9.
Memory usage is roughly in the order of (2 * number of reactions)^2 due to the required nullspace matrices and warmup points. So, large models easily take up a few GBs of RAM. However, most of the large matrices are kept in shared memory. So the RAM usage is independent of the number of processes.
References
 1
Megchelenbrink W, Huynen M, Marchiori E (2014) optGpSampler: An Improved Tool for Uniformly Sampling the SolutionSpace of GenomeScale Metabolic Networks. PLoS ONE 9(2): e86587. https://doi.org/10.1371/journal.pone.0086587
 sample(n: int, fluxes: bool = True) pandas.DataFrame ¶
Generate a set of samples.
This is the basic sampling function for all hitandrun samplers.
 Parameters
n (int) – The minimum number of samples that are generated at once.
fluxes (bool, optional) – Whether to return fluxes or the internal solver variables. If set to False, will return a variable for each forward and backward flux as well as all additional variables you might have defined in the model (default True).
 Returns
Returns a pandas DataFrame with n rows, each containing a flux sample.
 Return type
pandas.DataFrame
Notes
Performance of this function linearly depends on the number of reactions in your model and the thinning factor.
If the number of processes is larger than one, computation is split across the CPU cores of your machine. This may shorten computation time. However, there is also overhead in setting up parallel computation primitives so, we recommend to calculate large numbers of samples at once (n > 1000).
 __getstate__() Dict ¶
Return the object for serialization.
 cobra.sampling.sample(model: cobra.Model, n: int, method: str = 'optgp', thinning: int = 100, processes: int = 1, seed: Optional[int] = None) pandas.DataFrame [source]¶
Sample valid flux distributions from a cobra model.
Currently, two methods are supported:
 ‘optgp’ (default) which uses the OptGPSampler that supports parallel
sampling. Requires large numbers of samples to be performant (n > 1000). For smaller samples, ‘achr’ might be better suited. For details, refer [1]_ .
‘achr’ which uses artificial centering hitandrun. This is a single process method with good convergence. For details, refer [2]_ .
 Parameters
model (cobra.Model) – The model from which to sample flux distributions.
n (int) – The number of samples to obtain. When using ‘optgp’, this must be a multiple of processes, otherwise a larger number of samples will be returned.
method ({"optgp", "achr"}, optional) – The sampling algorithm to use (default “optgp”).
thinning (int, optional) – The thinning factor of the generated sampling chain. A thinning of 10 means samples are returned every 10 steps. Defaults to 100 which in benchmarks gives approximately uncorrelated samples. If set to 1 will return all iterates (default 100).
processes (int, optional) – Only used for ‘optgp’. The number of processes used to generate samples (default 1).
seed (int > 0, optional) – Sets the random number seed. Initialized to the current time stamp if None (default None).
 Returns
The generated flux samples. Each row corresponds to a sample of the fluxes and the columns are the reactions.
 Return type
pandas.DataFrame
Notes
The samplers have a correction method to ensure equality feasibility for longrunning chains, however this will only work for homogeneous models, meaning models with no nonzero fixed variables or constraints ( righthand side of the equalities are zero).
References
 1
Megchelenbrink W, Huynen M, Marchiori E (2014) optGpSampler: An Improved Tool for Uniformly Sampling the SolutionSpace of GenomeScale Metabolic Networks. PLoS ONE 9(2): e86587. https://doi.org/10.1371/journal.pone.0086587
 2
Direction Choice for Accelerated Convergence in HitandRun Sampling David E. Kaufman, Robert L. Smith Operations Research 199846:1 , 8495 https://doi.org/10.1287/opre.46.1.84