A sampler study

In this notebook, we perform a short study of how various samplers implemented in pyPESTO perform.

The pipeline

First, we show a typical workflow, fully integrating the samplers with a PEtab problem, using a toy example of a conversion reaction.

[1]:
import pypesto
import petab

# import to petab
petab_problem = petab.Problem.from_yaml(
    "conversion_reaction/conversion_reaction.yaml")
# import to pypesto
importer = pypesto.PetabImporter(petab_problem)
# create problem
problem = importer.create_problem()

Commonly, as a first step, optimization is performed, in order to find good parameter point estimates.

[2]:
%%time
result = pypesto.minimize(problem, n_starts=10)
Parameters obtained from history and optimizer do not match: [-0.91620777 -9.1644571 ], [-0.9160963  -9.16577932]
CPU times: user 937 ms, sys: 5.21 ms, total: 943 ms
Wall time: 943 ms
[3]:
ax = pypesto.visualize.waterfall(result, size=(4,4))
../_images/example_sampler_study_7_0.png

Next, we perform sampling. Here, we employ a pypesto.sample.AdaptiveParallelTemperingSampler sampler, which runs Markov Chain Monte Carlo (MCMC) chains on different temperatures. For each chain, we employ a pypesto.sample.AdaptiveMetropolisSampler. For more on the samplers see below or the API documentation.

[4]:
sampler = pypesto.AdaptiveParallelTemperingSampler(
    internal_sampler=pypesto.AdaptiveMetropolisSampler(),
    n_chains=3)

For the actual sampling, we call the pypesto.sample function. By passing the result object to the function, the previously found global optimum is used as starting point for the MCMC sampling.

[5]:
%%time
result = pypesto.sample(problem, n_samples=10000, sampler=sampler, result=result)
100%|██████████| 10000/10000 [00:30<00:00, 330.85it/s]
CPU times: user 29.8 s, sys: 445 ms, total: 30.2 s
Wall time: 30.3 s

When the sampling is finished, we can analyse our results. A first thing to do is to analyze the sampling burn-in:

[6]:
pypesto.geweke_test(result)
[6]:
0

pyPESTO provides functions to analyse both the sampling process as well as the obtained sampling result. Visualizing the traces e.g. allows to detect burn-in phases, or fine-tune hyperparameters. First, the parameter trajectories can be visualized:

[7]:
pypesto.geweke_test(result)
ax = pypesto.visualize.sampling_parameters_trace(result, use_problem_bounds=False)
../_images/example_sampler_study_15_0.png

Next, also the log posterior trace can be visualized:

[8]:
ax = pypesto.visualize.sampling_fval_trace(result)
../_images/example_sampler_study_17_0.png

To visualize the result, there are various options. The scatter plot shows histograms of 1-dim parameter marginals and scatter plots of 2-dimensional parameter combinations:

[9]:
ax = pypesto.visualize.sampling_scatter(result, size=[13,6])
../_images/example_sampler_study_19_0.png

sampling_1d_marginals allows to plot e.g. kernel density estimates or histograms (internally using seaborn):

[10]:
for i_chain in range(len(result.sample_result.betas)):
    pypesto.visualize.sampling_1d_marginals(
        result, i_chain=i_chain, suptitle=f"Chain: {i_chain}")
../_images/example_sampler_study_21_0.png
../_images/example_sampler_study_21_1.png
../_images/example_sampler_study_21_2.png

That’s it for the moment on using the sampling pipeline.

1-dim test problem

To compare and test the various implemented samplers, we first study a 1-dimensional test problem of a gaussian mixture density, together with a flat prior.

[11]:
import numpy as np
from scipy.stats import multivariate_normal
import seaborn as sns
import pypesto

def density(x):
    return 0.3*multivariate_normal.pdf(x, mean=-1.5, cov=0.1) + \
        0.7*multivariate_normal.pdf(x, mean=2.5, cov=0.2)

def nllh(x):
    return - np.log(density(x))

objective = pypesto.Objective(fun=nllh)
problem = pypesto.Problem(
    objective=objective, lb=-4, ub=5, x_names=['x'])

The likelihood has two separate modes:

[12]:
xs = np.linspace(-4, 5, 100)
ys = [density(x) for x in xs]

ax = sns.lineplot(xs, ys, color='C1')
../_images/example_sampler_study_27_0.png

Metropolis sampler

For this problem, let us try out the simplest sampler, the pypesto.sample.MetropolisSampler.

[13]:
%%time
sampler = pypesto.MetropolisSampler({'std': 0.5})
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
100%|██████████| 10000/10000 [00:03<00:00, 2644.33it/s]
CPU times: user 3.75 s, sys: 97.4 ms, total: 3.85 s
Wall time: 3.8 s

[14]:
pypesto.geweke_test(result)
ax = pypesto.visualize.sampling_1d_marginals(result)
ax[0][0].plot(xs, ys)
[14]:
[<matplotlib.lines.Line2D at 0x7f7260a7baf0>]
../_images/example_sampler_study_31_1.png

The obtained posterior does not accurately represent the distribution, often only capturing one mode. This is because it is hard for the Markov chain to jump between the distribution’s two modes. This can be fixed by choosing a higher proposal variation std:

[15]:
%%time
sampler = pypesto.MetropolisSampler({'std': 1})
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
100%|██████████| 10000/10000 [00:03<00:00, 3167.94it/s]
CPU times: user 3.23 s, sys: 113 ms, total: 3.34 s
Wall time: 3.17 s
[16]:
pypesto.geweke_test(result)
ax = pypesto.visualize.sampling_1d_marginals(result)
ax[0][0].plot(xs, ys)
[16]:
[<matplotlib.lines.Line2D at 0x7f725abaec10>]
../_images/example_sampler_study_34_1.png

In general, MCMC have difficulties exploring multimodel landscapes. One way to overcome this is to used parallel tempering. There, various chains are run, lifting the densities to different temperatures. At high temperatures, proposed steps are more likely to get accepted and thus jumps between modes more likely.

Parallel tempering sampler

In pyPESTO, the most basic parallel tempering algorithm is the pypesto.sample.ParallelTemperingSampler. It takes an internal_sampler parameter, to specify what sampler to use for performing sampling the different chains. Further, we can directly specify what inverse temperatures betas to use. When not specifying the betas explicitly but just the number of chains n_chains, an established near-exponential decay scheme is used.

[17]:
%%time
sampler = pypesto.ParallelTemperingSampler(
    internal_sampler=pypesto.MetropolisSampler(),
    betas=[1, 1e-1, 1e-2])
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
100%|██████████| 10000/10000 [00:11<00:00, 899.49it/s]
CPU times: user 11.2 s, sys: 473 ms, total: 11.7 s
Wall time: 11.1 s

[18]:
pypesto.geweke_test(result)
for i_chain in range(len(result.sample_result.betas)):
    pypesto.visualize.sampling_1d_marginals(
        result, i_chain=i_chain, suptitle=f"Chain: {i_chain}")
../_images/example_sampler_study_39_0.png
../_images/example_sampler_study_39_1.png
../_images/example_sampler_study_39_2.png

Of interest is here finally the first chain at index i_chain=0, which approximates the posterior well.

Adaptive Metropolis sampler

The problem of having to specify the proposal step variation manually can be overcome by using the pypesto.sample.AdaptiveMetropolisSampler, which iteratively adjusts the proposal steps to the function landscape.

[19]:
%%time
sampler = pypesto.AdaptiveMetropolisSampler()
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
100%|██████████| 10000/10000 [00:04<00:00, 2292.14it/s]
CPU times: user 4.42 s, sys: 24 ms, total: 4.45 s
Wall time: 4.38 s

[20]:
pypesto.geweke_test(result)
ax = pypesto.visualize.sampling_1d_marginals(result)
../_images/example_sampler_study_44_0.png

Adaptive parallel tempering sampler

The pypesto.sample.AdaptiveParallelTemperingSampler iteratively adjusts the temperatures to obtain good swapping rates between chains.

[21]:
%%time
sampler = pypesto.AdaptiveParallelTemperingSampler(
    internal_sampler=pypesto.AdaptiveMetropolisSampler(), n_chains=3)
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
100%|██████████| 10000/10000 [00:12<00:00, 803.56it/s]
CPU times: user 12.5 s, sys: 151 ms, total: 12.7 s
Wall time: 12.5 s
[22]:
pypesto.geweke_test(result)
for i_chain in range(len(result.sample_result.betas)):
    pypesto.visualize.sampling_1d_marginals(
        result, i_chain=i_chain, suptitle=f"Chain: {i_chain}")
../_images/example_sampler_study_48_0.png
../_images/example_sampler_study_48_1.png
../_images/example_sampler_study_48_2.png
[23]:
result.sample_result.betas
[23]:
array([1.00000000e+00, 2.20932285e-01, 2.00000000e-05])

Pymc3 sampler

[24]:
%%time
sampler = pypesto.Pymc3Sampler()
result = pypesto.sample(problem, 1e4, sampler, x0=np.array([0.5]))
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Sequential sampling (1 chains in 1 job)
Slice: [x]
Sampling chain 0, 0 divergences: 100%|██████████| 10500/10500 [00:24<00:00, 422.31it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks
CPU times: user 28.2 s, sys: 1.1 s, total: 29.3 s
Wall time: 29.5 s
[25]:
pypesto.geweke_test(result)
for i_chain in range(len(result.sample_result.betas)):
    pypesto.visualize.sampling_1d_marginals(
        result, i_chain=i_chain, suptitle=f"Chain: {i_chain}")
../_images/example_sampler_study_52_0.png

If not specified, pymc3 chooses an adequate sampler automatically.

2-dim test problem: Rosenbrock banana

The adaptive parallel tempering sampler with chains running adaptive Metropolis samplers is also able to sample from more challenging posterior distributions. To illustrates this shortly, we use the Rosenbrock function.

[26]:
import scipy.optimize as so
import pypesto

# first type of objective
objective = pypesto.Objective(fun=so.rosen)

dim_full = 4
lb = -5 * np.ones((dim_full, 1))
ub = 5 * np.ones((dim_full, 1))

problem = pypesto.Problem(objective=objective, lb=lb, ub=ub)
[27]:
%%time
sampler = pypesto.AdaptiveParallelTemperingSampler(
    internal_sampler=pypesto.AdaptiveMetropolisSampler(), n_chains=10)
result = pypesto.sample(problem, 1e4, sampler, x0=np.zeros(dim_full))
100%|██████████| 10000/10000 [00:31<00:00, 316.47it/s]
CPU times: user 31.7 s, sys: 148 ms, total: 31.9 s
Wall time: 31.7 s
[28]:
ax = pypesto.visualize.sampling_scatter(result)
ax = pypesto.visualize.sampling_1d_marginals(result)
Burn in index not found in the results, the full chain will be shown.
You may want to use, e.g., 'pypesto.sampling.geweke_test'.
Burn in index not found in the results, the full chain will be shown.
You may want to use, e.g., 'pypesto.sampling.geweke_test'.
../_images/example_sampler_study_58_1.png
../_images/example_sampler_study_58_2.png
[ ]: