pypesto.sample
Sample
Draw samples from the distribution, with support for various samplers.
- class pypesto.sample.AdaptiveMetropolisSampler[source]
Bases:
MetropolisSampler
Metropolis-Hastings sampler with adaptive proposal covariance.
A core problem of the standard Metropolis-Hastings sampler is its fixed proposal distribution, which must be manually tuned. This sampler adapts the proposal distribution during the sampling process based on previous samples. It adapts the correlation structure and the scaling factor of the proposal distribution. For both parts, there exist a variety of methods, see
Ballnus et al. 2017. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems (https://doi.org/10.1186/s12918-017-0433-1)
Andrieu et al. 2008. A tutorial on adaptive MCMC (https://doi.org/10.1007/s11222-008-9110-y)
for a review.
Here, we approximate the covariance matrix via a weighted average of current and earlier samples, with a decay factor determining the relative contribution of the current sample and earlier ones to the weighted average of mean and covariance. The scaling factor we aim to converge to a fixed target acceptance rate of 0.234, as suggested by theoretical results. The implementation is based on:
Lacki et al. 2015. State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm (https://doi.org/10.1007/s11222-015-9579-0)
Miasojedow et al. 2013. An adaptive parallel tempering algorithm (https://doi.org/10.1080/10618600.2013.778779)
In turn, these are based on adaptive MCMC as discussed in:
Haario et al. 2001. An adaptive Metropolis algorithm (https://doi.org/10.2307/3318737)
For reference matlab implementations see:
- class pypesto.sample.AdaptiveParallelTemperingSampler[source]
Bases:
ParallelTemperingSampler
Parallel tempering sampler with adaptive temperature adaptation.
Compared to the base class, this sampler adapts the temperatures during the sampling process. This both simplifies the setup as it avoids manual tuning, and improves the performance as the temperatures are adapted to the current state of the chains.
This implementation is based on:
Vousden et al. 2016. Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations (https://doi.org/10.1093/mnras/stv2422),
via a matlab reference implementation (https://github.com/ICB-DCM/PESTO/blob/master/private/performPT.m).
- class pypesto.sample.DynestySampler[source]
Bases:
Sampler
Use dynesty for sampling.
NB: get_samples returns MCMC-like samples, by resampling original dynesty samples according to their importance weights. This is because the original samples contain many low-likelihood samples. To work with the original samples, modify the results object with pypesto_result.sample_result = sampler.get_original_samples(), where sampler is an instance of pypesto.sample.DynestySampler. The original dynesty results object is available at sampler.raw_results.
NB: the dynesty samplers can be customized significantly, by providing sampler_args and run_args to your pypesto.sample.DynestySampler() call. For example, code to parallelize dynesty is provided in pyPESTO’s sampler_study.ipynb notebook.
Wrapper around https://dynesty.readthedocs.io/en/stable/, see there for details.
- __init__(sampler_args=None, run_args=None, dynamic=True, objective_type='neglogpost', prior_transform=None)[source]
Initialize sampler.
- Parameters:
sampler_args (
dict
) – Further keyword arguments that are passed on to the __init__ method of the dynesty sampler.run_args (
dict
) – Further keyword arguments that are passed on to the run_nested method of the dynesty sampler.dynamic (
bool
) – Whether to use dynamic or static nested sampling.objective_type (
str
) – The objective to optimize (as defined in pypesto.problem). Either pypesto.C.OBJECTIVE_NEGLOGLIKE or pypesto.C.OBJECTIVE_NEGLOGPOST. If pypesto.C.OBJECTIVE_NEGLOGPOST, then x_priors have to be defined in the problem.prior_transform (
callable
) – A function converting a sample from the unit cube to actual prior. If not provided, the default prior_transform function is used, which assumes uniform priors.
- get_original_samples()[source]
Get the samples into the fitting pypesto format.
- Return type:
- Returns:
The pyPESTO sample result.
- get_samples()[source]
Get MCMC-like samples into the fitting pypesto format.
- Return type:
- Returns:
The pyPESTO sample result.
- prior_transform_from_uniform(prior_sample)[source]
Transform prior sample from unit cube to pyPESTO prior.
- property raw_results
Get the raw dynesty results.
- property results
Deprecated in favor of raw_results.
- class pypesto.sample.EmceeSampler[source]
Bases:
Sampler
Use emcee for sampling.
Wrapper around https://emcee.readthedocs.io/en/stable/, see there for details.
- get_epsilon_ball_initial_state(center, problem, epsilon=0.001)[source]
Get walker initial positions as samples from an epsilon ball.
The ball is scaled in each direction according to the magnitude of the center in that direction.
It is assumed that, because vectors are generated near a good point, all generated vectors are evaluable, so evaluability is not checked.
Points that are generated outside the problem bounds will get shifted to lie on the edge of the problem bounds.
- Parameters:
center (
ndarray
) – The center of the epsilon ball. The dimension should match the full dimension of the pyPESTO problem. This will be returned as the first position.problem (
Problem
) – The pyPESTO problem.epsilon (
float
) – The relative radius of the ball. e.g., if epsilon=0.5 and the center of the first dimension is at 100, then the upper and lower bounds of the epsilon ball in the first dimension will be 150 and 50, respectively.
- initialize(problem, x0)[source]
Initialize the sampler.
It is recommended to initialize walkers
- Parameters:
x0 (
ndarray
|list
[ndarray
]) – The “a priori preferred position”. e.g., an optimized parameter vector. https://emcee.readthedocs.io/en/stable/user/faq/ The position of the first walker will be this, the remaining walkers will be assigned positions uniformly in a smaller ball around this vector. Alternatively, a set of vectors can be provided, which will be used to initialize walkers. In this case, any remaining walkers will be initialized at points sampled uniformly within the problem bounds.problem (Problem)
- Return type:
- class pypesto.sample.InternalSampler[source]
Bases:
Sampler
Sampler to be used inside a parallel tempering sampler.
The last sample can be obtained via get_last_sample and set via set_last_sample.
- abstract get_last_sample()[source]
Get the last sample in the chain.
- Return type:
InternalSample
- Returns:
internal_sample – The last sample in the chain in the exchange format.
- class pypesto.sample.MetropolisSampler[source]
Bases:
InternalSampler
Simple Metropolis-Hastings sampler with fixed proposal variance.
The Metropolis-Hastings sampler is a Markov chain Monte Carlo (MCMC) method generating a sequence of samples from a probability distribution.
This class implements a simple Metropolis algorithm with fixed symmetric Gaussian proposal distribution.
For the underlying original publication, see:
Metropolis et al. 1953. Equation of State Calculations by Fast Computing Machines (https://doi.org/10.1063/1.1699114)
Hastings 1970. Monte Carlo sampling methods using Markov chains and their applications (https://doi.org/10.1093/biomet/57.1.97)
For reference matlab implementations see:
https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
https://github.com/ICB-DCM/PESTO/blob/master/private/performPT.m
- get_last_sample()[source]
Get the last sample in the chain.
- Return type:
InternalSample
- Returns:
internal_sample – The last sample in the chain in the exchange format.
- make_internal(temper_lpost)[source]
Allow the inner samplers to be used as inner samplers.
Can be called by parallel tempering samplers during initialization. Default: Do nothing.
- Parameters:
temper_lpost (
bool
) – Whether to temperate the posterior or only the likelihood.
- class pypesto.sample.ParallelTemperingSampler[source]
Bases:
Sampler
Simple parallel tempering sampler.
Parallel tempering is a Markov chain Monte Carlo (MCMC) method that uses multiple chains with different temperatures to sample from a probability distribution. The chains are coupled by swapping samples between them. This allows to sample from distributions with multiple modes more efficiently, as high-temperature chains can jump between modes, while low-temperature chains can sample the modes more precisely.
This implementation is based on:
Vousden et al. 2016. Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations (https://doi.org/10.1093/mnras/stv2422),
via a matlab-based reference implementation (https://github.com/ICB-DCM/PESTO/blob/master/private/performPT.m).
- __init__(internal_sampler, betas=None, n_chains=None, options=None)[source]
- Parameters:
internal_sampler (InternalSampler)
n_chains (int)
options (dict)
- class pypesto.sample.Sampler[source]
Bases:
ABC
Sampler base class, not functional on its own.
The sampler maintains an internal chain, which is initialized in initialize, and updated in sample.
- classmethod default_options()[source]
Set/Get default options.
- Return type:
- Returns:
default_options – Default sampler options.
- pypesto.sample.bridge_sampling_log_evidence(result, n_posterior_samples_init=None, initial_guess_log_evidence=None, max_iter=1000, tol=1e-06)[source]
Compute the log evidence using bridge sampling.
Based on “A Tutorial on Bridge Sampling” by Gronau et al. (2017): https://doi.org/10.1016/j.jmp.2017.09.005. Using the optimal bridge function by Meng and Wong (1996) which minimises the relative mean-squared error. Proposal function is calibrated using posterior samples, which are not used for the final bridge estimate (as this may result in an underestimation of the marginal likelihood, see Overstall and Forster (2010)).
- Parameters:
result (
Result
) – The pyPESTO result object with filled sample result.n_posterior_samples_init (
Optional
[int
]) – Number of samples used to calibrate the proposal function. By default, half of the posterior samples are used.initial_guess_log_evidence (
Optional
[float
]) – Initial guess for the log evidence. By default, the Laplace approximation is used to compute the initial guess.max_iter (
int
) – Maximum number of iterations. Default is 1000.tol (
float
) – Tolerance for convergence. Default is 1e-6.
- Return type:
- Returns:
log_evidence
- pypesto.sample.calculate_ci_mcmc_sample(result, ci_level=0.95, exclude_burn_in=True)[source]
Calculate parameter credibility intervals based on MCMC samples.
- Parameters:
- Return type:
- Returns:
lb, ub – Bounds of the MCMC percentile-based confidence interval.
- pypesto.sample.calculate_ci_mcmc_sample_prediction(simulated_values, ci_level=0.95)[source]
Calculate prediction credibility intervals based on MCMC samples.
- pypesto.sample.effective_sample_size(result)[source]
Calculate the effective sample size of the MCMC chains.
- pypesto.sample.geweke_test(result, zscore=2.0, chain_number=0)[source]
Calculate the burn-in of MCMC chains.
- Parameters:
- Return type:
- Returns:
burn_in – Iteration where the first and the last fraction of the chain do not differ significantly regarding Geweke test -> Burn-In
- pypesto.sample.harmonic_mean_log_evidence(result, prior_samples=None, neg_log_likelihood_fun=None)[source]
Compute the log evidence using the harmonic mean estimator.
Stabilized harmonic mean estimator is used if prior samples are provided. Newton and Raftery (1994): https://doi.org/10.1111/j.2517-6161.1994.tb01956.x
- Parameters:
result (
Result
)prior_samples (
Optional
[ndarray
]) – Samples from the prior distribution. If samples from the prior are provided, the stabilized harmonic mean is computed (recommended). Then, the likelihood function must be provided as well.neg_log_likelihood_fun (callable) – Function to evaluate the negative log likelihood. Necessary if prior_samples is not None.
- Return type:
- Returns:
log_evidence
- pypesto.sample.laplace_approximation_log_evidence(problem, x)[source]
Compute the log evidence using the Laplace approximation.
The objective in your problem must be a negative log posterior, and support Hessian computation.
- pypesto.sample.parallel_tempering_log_evidence(result, method='trapezoid', use_all_chains=True)[source]
Perform thermodynamic integration or steppingstone sampling to estimate the log evidence.
Thermodynamic integration is performed by integrating the mean log likelihood over the temperatures. Errors might come from the samples itself or the numerical integration. Steppingstone sampling is a form of importance sampling that uses the maximum likelihood of each temperature. It does not require an integration, but can be biased for a small number of temperatures. See (Annis et al., 2019), https://doi.org/10.1016/j.jmp.2019.01.005, for more details.
- This should be used with a beta decay temperature schedule and not with the adaptive version of
parallel tempering sampling as the temperature schedule is not optimal for thermodynamic integration.
- Parameters:
result (
Result
) – Result object containing the samples.method (
str
) – Integration method, either ‘trapezoid’ or ‘simpson’ to perform thermodynamic integration (uses scipy for integration) or ‘steppingstone’ to perform steppingstone sampling.use_all_chains (
bool
) – If True, calculate burn-in for each chain and use the maximal burn-in for all chains for the integration. This will fail if not all chains have converged yet. Otherwise, use only the converged chains for the integration (might increase the integration error).
- Return type:
- pypesto.sample.sample(problem, n_samples, sampler=None, x0=None, result=None, filename=None, overwrite=False)[source]
Call to do parameter sampling.
- Parameters:
problem (
Problem
) – The problem to be solved. If None is provided, apypesto.AdaptiveMetropolisSampler
is used.n_samples (
Optional
[int
]) – Number of samples to generate. None can be used if the sampler does not use n_samples.sampler (
Sampler
) – The sampler to perform the actual sampling.x0 (
Union
[ndarray
,list
[ndarray
]]) – Initial parameter for the Markov chain. If None, the best parameter found in optimization is used. Note that some samplers require an initial parameter, some may ignore it. x0 can also be a list, to have separate starting points for parallel tempering chains.result (
Result
) – A result to write to. If None provided, one is created from the problem.filename (
Union
[str
,Callable
,None
]) – Name of the hdf5 file, where the result will be saved. Default is None, which deactivates automatic saving. If set to “Auto” it will automatically generate a file named year_month_day_profiling_result.hdf5. Optionally a method, see docs for pypesto.store.auto.autosave.overwrite (
bool
) – Whether to overwrite result/sampling in the autosave file if it already exists.
- Return type:
- Returns:
result – A result with filled in sample_options part.