pypesto.sample

Sample

Draw samples from the distribution, with support for various samplers.

class pypesto.sample.AdaptiveMetropolisSampler[source]

Bases: MetropolisSampler

Metropolis-Hastings sampler with adaptive proposal covariance.

A core problem of the standard Metropolis-Hastings sampler is its fixed proposal distribution, which must be manually tuned. This sampler adapts the proposal distribution during the sampling process based on previous samples. It adapts the correlation structure and the scaling factor of the proposal distribution. For both parts, there exist a variety of methods, see

Ballnus et al. 2017. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems (https://doi.org/10.1186/s12918-017-0433-1)
Andrieu et al. 2008. A tutorial on adaptive MCMC (https://doi.org/10.1007/s11222-008-9110-y)

for a review.

Here, we approximate the covariance matrix via a weighted average of current and earlier samples, with a decay factor determining the relative contribution of the current sample and earlier ones to the weighted average of mean and covariance. The scaling factor we aim to converge to a fixed target acceptance rate of 0.234, as suggested by theoretical results. The implementation is based on:

Lacki et al. 2015. State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm (https://doi.org/10.1007/s11222-015-9579-0)
Miasojedow et al. 2013. An adaptive parallel tempering algorithm (https://doi.org/10.1080/10618600.2013.778779)

In turn, these are based on adaptive MCMC as discussed in:

Haario et al. 2001. An adaptive Metropolis algorithm (https://doi.org/10.2307/3318737)

For reference matlab implementations see:

__init__(options=None)[source]

Parameters:: options (dict)

classmethod default_options()[source]: Return the default options for the sampler.

initialize(problem, x0)[source]

Initialize the sampler.

Parameters:

problem (Problem)
x0 (ndarray)

class pypesto.sample.AdaptiveParallelTemperingSampler[source]

Bases: ParallelTemperingSampler

Parallel tempering sampler with adaptive temperature adaptation.

Compared to the base class, this sampler adapts the temperatures during the sampling process. This both simplifies the setup as it avoids manual tuning, and improves the performance as the temperatures are adapted to the current state of the chains.

This implementation is based on:

Vousden et al. 2016. Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations (https://doi.org/10.1093/mnras/stv2422),

via a matlab reference implementation (https://github.com/ICB-DCM/PESTO/blob/master/private/performPT.m).

adjust_betas(i_sample, swapped)[source]

Update temperatures as in Vousden2016.

Parameters:

i_sample (int)
swapped (Sequence[bool])

classmethod default_options()[source]

Get default options for sampler.

Return type:: dict

class pypesto.sample.DynestySampler[source]

Bases: Sampler

Use dynesty for sampling.

NB: get_samples returns MCMC-like samples, by resampling original dynesty samples according to their importance weights. This is because the original samples contain many low-likelihood samples. To work with the original samples, modify the results object with pypesto_result.sample_result = sampler.get_original_samples(), where sampler is an instance of pypesto.sample.DynestySampler. The original dynesty results object is available at sampler.raw_results.

NB: the dynesty samplers can be customized significantly, by providing sampler_args and run_args to your pypesto.sample.DynestySampler() call. For example, code to parallelize dynesty is provided in pyPESTO’s sampler_study.ipynb notebook.

Wrapper around https://dynesty.readthedocs.io/en/stable/, see there for details.

__init__(sampler_args=None, run_args=None, dynamic=True, objective_type='neglogpost', prior_transform=None)[source]

Initialize sampler.

Parameters:

sampler_args (dict) – Further keyword arguments that are passed on to the __init__ method of the dynesty sampler.
run_args (dict) – Further keyword arguments that are passed on to the run_nested method of the dynesty sampler.
dynamic (bool) – Whether to use dynamic or static nested sampling.
objective_type (str) – The objective to optimize (as defined in pypesto.problem). Either pypesto.C.OBJECTIVE_NEGLOGLIKE or pypesto.C.OBJECTIVE_NEGLOGPOST. If pypesto.C.OBJECTIVE_NEGLOGPOST, then x_priors have to be defined in the problem.
prior_transform (callable) – A function converting a sample from the unit cube to actual prior. If not provided, the default prior_transform function is used, which assumes uniform priors.

get_original_samples()[source]

Get the samples into the fitting pypesto format.

Return type:: McmcPtResult
Returns:: The pyPESTO sample result.

get_samples()[source]

Get MCMC-like samples into the fitting pypesto format.

Return type:: McmcPtResult
Returns:: The pyPESTO sample result.

initialize(problem, x0=None)[source]

Initialize the sampler.

Return type:

None

Parameters:

problem (Problem)
x0 (ndarray | list[ndarray])

loglikelihood(x)[source]: Log-probability density function.

prior_transform_from_uniform(prior_sample)[source]

Transform prior sample from unit cube to pyPESTO prior.

Parameters:: prior_sample (ndarray) – The prior sample, provided by dynesty.
Return type:: ndarray
Returns:: The transformed prior sample.

property raw_results: Get the raw dynesty results.

restore_internal_sampler(filename)[source]

Restore the state of the internal dynesty sampler.

Parameters:: filename (str) – The internal sampler will be saved here.
Return type:: None

property results: Deprecated in favor of raw_results.

sample(n_samples, beta=None)[source]

Return the most recent sample state.

Return type:

None

Parameters:

n_samples (int)
beta (float)

save_internal_sampler(filename)[source]

Save the state of the internal dynesty sampler.

This makes it easier to analyze the original dynesty samples, after sampling, with restore_internal.

Parameters:: filename (str) – The internal sampler will be saved here.
Return type:: None

class pypesto.sample.EmceeSampler[source]

Bases: Sampler

Use emcee for sampling.

Wrapper around https://emcee.readthedocs.io/en/stable/, see there for details.

__init__(nwalkers=1, sampler_args=None, run_args=None)[source]

Initialize sampler.

Parameters:

nwalkers (int) – The number of walkers in the ensemble.
sampler_args (dict) – Further keyword arguments that are passed on to emcee.EnsembleSampler.__init__.
run_args (dict) – Further keyword arguments that are passed on to emcee.EnsembleSampler.run_mcmc.

get_epsilon_ball_initial_state(center, problem, epsilon=0.001)[source]

Get walker initial positions as samples from an epsilon ball.

The ball is scaled in each direction according to the magnitude of the center in that direction.

It is assumed that, because vectors are generated near a good point, all generated vectors are evaluable, so evaluability is not checked.

Points that are generated outside the problem bounds will get shifted to lie on the edge of the problem bounds.

Parameters:

center (ndarray) – The center of the epsilon ball. The dimension should match the full dimension of the pyPESTO problem. This will be returned as the first position.
problem (Problem) – The pyPESTO problem.
epsilon (float) – The relative radius of the ball. e.g., if epsilon=0.5 and the center of the first dimension is at 100, then the upper and lower bounds of the epsilon ball in the first dimension will be 150 and 50, respectively.

get_samples()[source]

Get the samples into the fitting pypesto format.

Return type:: McmcPtResult

initialize(problem, x0)[source]

Initialize the sampler.

It is recommended to initialize walkers

Parameters:

x0 (ndarray | list[ndarray]) – The “a priori preferred position”. e.g., an optimized parameter vector. https://emcee.readthedocs.io/en/stable/user/faq/ The position of the first walker will be this, the remaining walkers will be assigned positions uniformly in a smaller ball around this vector. Alternatively, a set of vectors can be provided, which will be used to initialize walkers. In this case, any remaining walkers will be initialized at points sampled uniformly within the problem bounds.
problem (Problem)

Return type:

None

sample(n_samples, beta=1.0)[source]

Return the most recent sample state.

Return type:

None

Parameters:

n_samples (int)
beta (float)

class pypesto.sample.InternalSampler[source]

Bases: Sampler

Sampler to be used inside a parallel tempering sampler.

The last sample can be obtained via get_last_sample and set via set_last_sample.

abstractmethod get_last_sample()[source]

Get the last sample in the chain.

Return type:: InternalSample
Returns:: internal_sample – The last sample in the chain in the exchange format.

make_internal(temper_lpost)[source]

Allow the inner samplers to be used as inner samplers.

Can be called by parallel tempering samplers during initialization. Default: Do nothing.

Parameters:: temper_lpost (bool) – Whether to temperate the posterior or only the likelihood.

abstractmethod set_last_sample(sample)[source]

Set the last sample in the chain to the passed value.

Parameters:: sample (InternalSample) – The sample that will replace the last sample in the chain.

class pypesto.sample.MetropolisSampler[source]

Bases: InternalSampler

Simple Metropolis-Hastings sampler with fixed proposal variance.

The Metropolis-Hastings sampler is a Markov chain Monte Carlo (MCMC) method generating a sequence of samples from a probability distribution.

This class implements a simple Metropolis algorithm with fixed symmetric Gaussian proposal distribution.

For the underlying original publication, see:

Metropolis et al. 1953. Equation of State Calculations by Fast Computing Machines (https://doi.org/10.1063/1.1699114)
Hastings 1970. Monte Carlo sampling methods using Markov chains and their applications (https://doi.org/10.1093/biomet/57.1.97)

For reference matlab implementations see:

__init__(options=None)[source]

Parameters:: options (dict)

classmethod default_options()[source]: Return the default options for the sampler.

get_last_sample()[source]

Get the last sample in the chain.

Return type:: InternalSample
Returns:: internal_sample – The last sample in the chain in the exchange format.

get_samples()[source]

Get the samples into the fitting pypesto format.

Return type:: McmcPtResult

initialize(problem, x0)[source]

Initialize the sampler.

Parameters:

problem (Problem)
x0 (ndarray)

make_internal(temper_lpost)[source]

Allow the inner samplers to be used as inner samplers.

Can be called by parallel tempering samplers during initialization. Default: Do nothing.

Parameters:: temper_lpost (bool) – Whether to temperate the posterior or only the likelihood.

sample(n_samples, beta=1.0)[source]

Load last recorded particle.

Parameters:

n_samples (int)
beta (float)

set_last_sample(sample)[source]

Set the last sample in the chain to the passed value.

Parameters:: sample (InternalSample) – The sample that will replace the last sample in the chain.

class pypesto.sample.ParallelTemperingSampler[source]

Bases: Sampler

Simple parallel tempering sampler.

Parallel tempering is a Markov chain Monte Carlo (MCMC) method that uses multiple chains with different temperatures to sample from a probability distribution. The chains are coupled by swapping samples between them. This allows to sample from distributions with multiple modes more efficiently, as high-temperature chains can jump between modes, while low-temperature chains can sample the modes more precisely.

This implementation is based on:

Vousden et al. 2016. Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations (https://doi.org/10.1093/mnras/stv2422),

via a matlab-based reference implementation (https://github.com/ICB-DCM/PESTO/blob/master/private/performPT.m).

__init__(internal_sampler, betas=None, n_chains=None, options=None)[source]

Parameters:

internal_sampler (InternalSampler)
betas (Sequence[float])
n_chains (int)
options (dict)

adjust_betas(i_sample, swapped)[source]

Adjust temperature values. Default: Do nothing.

Parameters:

i_sample (int)
swapped (Sequence[bool])

classmethod default_options()[source]

Return the default options for the sampler.

Return type:: dict

get_samples()[source]

Concatenate all chains.

Return type:: McmcPtResult

initialize(problem, x0)[source]

Initialize all samplers.

Parameters:

problem (Problem)
x0 (ndarray | list[ndarray])

sample(n_samples, beta=1.0)[source]

Sample and swap in between samplers.

Parameters:

n_samples (int)
beta (float)

swap_samples()[source]

Swap samples as in Vousden2016.

Return type:: Sequence[bool]

class pypesto.sample.PymcSampler[source]

Bases: Sampler

Use pymc for sampling.

Wrapper around Pymc https://www.pymc.io/welcome.html samplers.

Parameters:

step_function – A pymc step function, e.g. NUTS, Slice. If not specified, pymc determines one automatically (preferable).
**kwargs – Options are directly passed on to pymc.sample.

__init__(step_function=None, post_compute_fval=True, **kwargs)[source]

Parameters:: post_compute_fval (bool)

get_samples()[source]

Convert result from pymc to McmcPtResult.

Return type:: McmcPtResult

initialize(problem, x0)[source]

Initialize the sampler.

Parameters:

problem (Problem) – The problem for which to sample.
x0 (ndarray) – Should, but is not required to, be used as initial parameter.

sample(n_samples, beta=1.0)[source]

Sample the problem.

Parameters:

n_samples (int) – Number of samples to be computed.
beta (float) – Inverse temperature for tempering (default: 1.0).

classmethod translate_options(options)[source]

Translate options and fill in defaults.

Parameters:: options – Options configuring the sampler.

class pypesto.sample.Sampler[source]

Bases: ABC

Sampler base class, not functional on its own.

The sampler maintains an internal chain, which is initialized in initialize, and updated in sample.

__init__(options=None)[source]

Parameters:: options (dict)

classmethod default_options()[source]

Set/Get default options.

Return type:: dict
Returns:: default_options – Default sampler options.

abstractmethod get_samples()[source]

Get the generated samples.

Return type:: McmcPtResult

abstractmethod initialize(problem, x0)[source]

Initialize the sampler.

Parameters:

problem (Problem) – The problem for which to sample.
x0 (ndarray | list[ndarray]) – Should, but is not required to, be used as initial parameter.

abstractmethod sample(n_samples, beta=1.0)[source]

Perform sampling.

Parameters:

n_samples (int) – Number of samples to generate.
beta (float) – Inverse of the temperature to which the system is elevated.

classmethod translate_options(options)[source]

Translate options and fill in defaults.

Parameters:: options – Options configuring the sampler.

pypesto.sample.auto_correlation(result)[source]

Calculate the autocorrelation of the MCMC chains.

Parameters:: result (Result) – The pyPESTO result object with filled sample result.
Return type:: float
Returns:: auto_correlation – Estimate of the integrated autocorrelation time of the MCMC chains.

pypesto.sample.bridge_sampling_log_evidence(result, n_posterior_samples_init=None, initial_guess_log_evidence=None, max_iter=1000, tol=1e-06)[source]

Compute the log evidence using bridge sampling.

Based on “A Tutorial on Bridge Sampling” by Gronau et al. (2017): https://doi.org/10.1016/j.jmp.2017.09.005. Using the optimal bridge function by Meng and Wong (1996) which minimises the relative mean-squared error. Proposal function is calibrated using posterior samples, which are not used for the final bridge estimate (as this may result in an underestimation of the marginal likelihood, see Overstall and Forster (2010)).

Parameters:

result (Result) – The pyPESTO result object with filled sample result.
n_posterior_samples_init (int | None) – Number of samples used to calibrate the proposal function. By default, half of the posterior samples are used.
initial_guess_log_evidence (float | None) – Initial guess for the log evidence. By default, the Laplace approximation is used to compute the initial guess.
max_iter (int) – Maximum number of iterations. Default is 1000.
tol (float) – Tolerance for convergence. Default is 1e-6.

Return type:

float

Returns:

log_evidence

pypesto.sample.calculate_ci_mcmc_sample(result, ci_level=0.95, exclude_burn_in=True)[source]

Calculate parameter credibility intervals based on MCMC samples.

Parameters:

result (Result) – The pyPESTO result object with filled sample result.
ci_level (float) – Lower tail probability, defaults to 95% interval.
exclude_burn_in (bool) – Whether to exclude the burn-in samples.

Return type:

tuple[ndarray, ndarray]

Returns:

lb, ub – Bounds of the MCMC percentile-based confidence interval.

pypesto.sample.calculate_ci_mcmc_sample_prediction(simulated_values, ci_level=0.95)[source]

Calculate prediction credibility intervals based on MCMC samples.

Parameters:

simulated_values (ndarray) – Simulated model states or model observables.
ci_level (float) – Lower tail probability, defaults to 95% interval.

Return type:

tuple[ndarray, ndarray]

Returns:

lb, ub – Bounds of the MCMC-based prediction confidence interval.

pypesto.sample.effective_sample_size(result)[source]

Calculate the effective sample size of the MCMC chains.

Parameters:: result (Result) – The pyPESTO result object with filled sample result.
Return type:: float
Returns:: ess – Estimate of the effective sample size of the MCMC chains.

pypesto.sample.geweke_test(result, zscore=2.0, chain_number=0)[source]

Calculate the burn-in of MCMC chains.

Parameters:

result (Result) – The pyPESTO result object with filled sample result.
zscore (float) – The Geweke test threshold.
chain_number (int) – The chain number to be used for the Geweke test (in a parallel tempering setting). Usually we are only interested in the first chain.

Return type:

int

Returns:

burn_in – Iteration where the first and the last fraction of the chain do not differ significantly regarding Geweke test -> Burn-In

pypesto.sample.harmonic_mean_log_evidence(result, prior_samples=None, neg_log_likelihood_fun=None)[source]

Compute the log evidence using the harmonic mean estimator.

Stabilized harmonic mean estimator is used if prior samples are provided. Newton and Raftery (1994): https://doi.org/10.1111/j.2517-6161.1994.tb01956.x

Parameters:

result (Result)
prior_samples (ndarray | None) – Samples from the prior distribution. If samples from the prior are provided, the stabilized harmonic mean is computed (recommended). Then, the likelihood function must be provided as well.
neg_log_likelihood_fun (Callable | None) – Function to evaluate the negative log likelihood. Necessary if prior_samples is not None.

Return type:

float

Returns:

log_evidence

pypesto.sample.laplace_approximation_log_evidence(problem, x)[source]

Compute the log evidence using the Laplace approximation.

The objective in your problem must be a negative log posterior, and support Hessian computation.

Parameters:

problem (Problem) – The problem to compute the log evidence for.
x (ndarray) – The maximum a posteriori estimate at which to compute the log evidence.

Return type:

float

Returns:

log_evidence (float)

pypesto.sample.parallel_tempering_log_evidence(result, method='trapezoid', use_all_chains=True)[source]

Perform thermodynamic integration or steppingstone sampling to estimate the log evidence.

Thermodynamic integration is performed by integrating the mean log likelihood over the temperatures. Errors might come from the samples itself or the numerical integration. Steppingstone sampling is a form of importance sampling that uses the maximum likelihood of each temperature. It does not require an integration, but can be biased for a small number of temperatures. See (Annis et al., 2019), https://doi.org/10.1016/j.jmp.2019.01.005, for more details.

This should be used with a beta decay temperature schedule and not with the adaptive version of: parallel tempering sampling as the temperature schedule is not optimal for thermodynamic integration.

Parameters:

result (Result) – Result object containing the samples.
method (str) – Integration method, either ‘trapezoid’ or ‘simpson’ to perform thermodynamic integration (uses scipy for integration) or ‘steppingstone’ to perform steppingstone sampling.
use_all_chains (bool) – If True, calculate burn-in for each chain and use the maximal burn-in for all chains for the integration. This will fail if not all chains have converged yet. Otherwise, use only the converged chains for the integration (might increase the integration error).

Return type:

float | None

pypesto.sample.sample(problem, n_samples, sampler=None, x0=None, result=None, warm_start=1.0, filename=None, overwrite=False)[source]

Call to do parameter sampling.

Parameters:

problem (Problem) – The problem to be solved. If None is provided, a pypesto.AdaptiveMetropolisSampler is used.
n_samples (int | None) – Number of samples to generate. None can be used if the sampler does not use n_samples.
sampler (Sampler) – The sampler to perform the actual sampling.
x0 (ndarray | list[ndarray]) – Initial parameter for the Markov chain. If None, the best parameter found in optimization is used. Note that some samplers require an initial parameter, some may ignore it. x0 can also be a list, to have separate starting points for parallel tempering chains.
warm_start (float) – Whether to warm start from previous optimization results stored in result or to sample from the prior. The value is the convex combination between the two, with 1.0 meaning only warm start and 0.0 meaning only sample from prior. If x0 is provided, warm_start is ignored.
result (Result) – A result to write to. If None provided, one is created from the problem.
filename (str | Callable | None) – Name of the hdf5 file, where the result will be saved. Default is None, which deactivates automatic saving. If set to “Auto” it will automatically generate a file named year_month_day_profiling_result.hdf5. Optionally a method, see docs for pypesto.store.auto.autosave.
overwrite (bool) – Whether to overwrite result/sampling in the autosave file if it already exists.

Return type:

Result

Returns:

result – A result with filled in sample_options part.