pyPESTO: Getting started
This notebook takes you through the first steps to get started with pyPESTO.
pyPESTO is a python package for parameter inference, offering a unified interface to various optimization and sampling methods. pyPESTO is highly modular and customizable, e.g., with respect to objective function definition and employed inference algorithms.
[1]:
import logging
import amici
import matplotlib as mpl
import numpy as np
import scipy as sp
import pypesto.optimize as optimize
import pypesto.petab
import pypesto.visualize as visualize
np.random.seed(1)
# increase image resolution
mpl.rcParams["figure.dpi"] = 300
1. Objective Definition
pyPESTO allows the definition of custom objectives and offers support for objectives defined in the PEtab format.
Custom Objective Definition
You can define an objective via a python function. Also providing an analytical gradient (and potentially also a Hessian) improves the performance of Gradient/Hessian-based optimizers. When accessing parameter uncertainties via profile-likelihoods/sampling, pyPESTO interprets the objective function as the negative-log-likelihood/negative-log-posterior. A more in-depth construction of a custom objective function can be found in a designated example notebook.
[2]:
# define objective function
def f(x: np.array):
return x[0] ** 2 + x[1] ** 2
# define gradient
def grad(x: np.array):
return 2 * x
# define objective
custom_objective = pypesto.Objective(fun=f, grad=grad)
Define lower and upper parameter bounds and create an optimization problem.
[3]:
# define optimization bounds
lb = np.array([-10, -10])
ub = np.array([10, 10])
# create problem
custom_problem = pypesto.Problem(objective=custom_objective, lb=lb, ub=ub)
Now choose an optimizer to perform the optimization. minimize
uses multi-start optimization, meaning that the optimization is run n_start
times from different initial values, in case the problem contains multiple local optima (which of course is not the case for this toy problem).
[4]:
# choose optimizer
optimizer = optimize.ScipyOptimizer()
# do the optimization
result_custom_problem = optimize.minimize(
problem=custom_problem, optimizer=optimizer, n_starts=10
)
result_custom_problem.optimize_result
now stores a list, that contains the results and metadata of the individual optimizer runs (ordered by function value).
[5]:
# E.g., The best model fit was obtained by the following optimization run:
result_custom_problem.optimize_result.list[0]
[5]:
{'id': '1',
'x': array([0., 0.]),
'fval': 0.0,
'grad': array([0., 0.]),
'hess': None,
'res': None,
'sres': None,
'n_fval': 4,
'n_grad': 4,
'n_hess': 0,
'n_res': 0,
'n_sres': 0,
'x0': array([-9.9977125 , -3.95334855]),
'fval0': 115.58322004264032,
'history': <pypesto.history.base.CountHistory at 0x7fa118e1bf90>,
'exitflag': 0,
'time': 0.0014843940734863281,
'message': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
'optimizer': "<ScipyOptimizer method=L-BFGS-B options={'disp': False, 'maxfun': 1000}>",
'free_indices': array([0, 1]),
'inner_parameters': None,
'spline_knots': None}
[6]:
# Objective function values of the different optimizer runs:
result_custom_problem.optimize_result.get_for_key("fval")
[6]:
[0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
1.7609224440720174e-36,
3.429973604864721e-36,
6.379533727371235e-36,
8.94847274678846e-36]
Problem Definition via PEtab
Background on PEtab
PyPESTO supports the PEtab standard. PEtab is a data format for specifying parameter estimation problems in systems biology.
A PEtab problem consist of an SBML file, defining the model topology and a set of .tsv
files, defining experimental conditions, observables, measurements and parameters (and their optimization bounds, scale, priors…). All files that make up a PEtab problem can be structured in a .yaml
file. The pypesto.Objective
coming from a PEtab problem corresponds to the negative-log-likelihood/negative-log-posterior distribution of the parameters.
For more details on PEtab, the interested reader is referred to PEtab’s format definition, for examples, the reader is referred to the PEtab benchmark collection. The Model from Böhm et al. JProteomRes 2014 is part of the benchmark collection and will be used as the running example throughout this notebook.
PyPESTO provides an interface to the model simulation tool AMICI for the simulation of Ordinary Differential Equation (ODE) models specified in the SBML format.
Basic Model Import and Optimization
The first step is to import a PEtab problem and create a pypesto.problem
object:
[7]:
%%capture
# directory of the PEtab problem
petab_yaml = "./conversion_reaction/conversion_reaction.yaml"
importer = pypesto.petab.PetabImporter.from_yaml(petab_yaml)
problem = importer.create_problem(verbose=False)
Compiling amici model to folder /home/docs/checkouts/readthedocs.org/user_builds/pypesto/checkouts/stable/doc/example/amici_models/0.27.0/conversion_reaction_0.
Next, we choose an optimizer
to perform the multi start optimization.
[8]:
%%time
%%capture
# choose optimizer
optimizer = optimize.ScipyOptimizer()
# do the optimization
result = optimize.minimize(problem=problem, optimizer=optimizer, n_starts=5)
CPU times: user 193 ms, sys: 12.2 ms, total: 205 ms
Wall time: 205 ms
result.optimize_result
contains a list with the ordered optimization results.
[9]:
# E.g., best model fit was obtained by the following optimization run:
result.optimize_result.list[0]
[9]:
{'id': '3',
'x': array([-0.25416788, -0.60834112]),
'fval': -25.356201652406327,
'grad': array([ 4.01799935e-05, -2.92969345e-05]),
'hess': None,
'res': None,
'sres': None,
'n_fval': 36,
'n_grad': 36,
'n_hess': 0,
'n_res': 0,
'n_sres': 0,
'x0': array([ 4.42839188, -7.60243554]),
'fval0': 3275.6428996850577,
'history': <pypesto.history.base.CountHistory at 0x7fa114ee9110>,
'exitflag': 0,
'time': 0.08060836791992188,
'message': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
'optimizer': "<ScipyOptimizer method=L-BFGS-B options={'disp': False, 'maxfun': 1000}>",
'free_indices': array([0, 1]),
'inner_parameters': None,
'spline_knots': None}
[10]:
# Objective function values of the different optimizer runs:
result.optimize_result.get_for_key("fval")
[10]:
[-25.356201652406327,
49.9096545687522,
132.61011287146414,
132.61011287146414,
3275.7137352431205]
2. Optimizer Choice
PyPESTO provides a unified interface to a variety of optimizers of different types:
All scipy optimizer (
optimize.ScipyOptimizer(method=<method_name>)
)function-value or least-squares-based optimizers
gradient or hessian-based optimizers
IpOpt (
optimize.IpoptOptimizer()
)Interior point method
Dlib (
optimize.DlibOptimizer(options={'maxiter': <max. number of iterations>})
)Global optimizer
Gradient-free
FIDES (
optimize.FidesOptimizer()
)Interior Trust Region optimizer
Particle Swarm (
optimize.PyswarmsOptimizer()
)Particle swarm algorithm
Gradient-free
CMA-ES (
optimize.CmaOptimizer()
)Covariance Matrix Adaptation Evolution Strategy
Evolutionary Algorithm
[11]:
optimizer_scipy_lbfgsb = optimize.ScipyOptimizer(method="L-BFGS-B")
optimizer_scipy_powell = optimize.ScipyOptimizer(method="Powell")
optimizer_fides = optimize.FidesOptimizer(verbose=logging.ERROR)
optimizer_pyswarm = optimize.PyswarmsOptimizer(par_popsize=10)
The following performs 10 multi-start runs with different optimizers in order to compare their performance. For a faster execution of this notebook, we run only 10 starts. In application, one would use many more optimization starts: around 100-1000 in most cases.
Note: dlib
and pyswarm
need to be installed for this section to run. Furthermore, the computation time is in the order of minutes, so you might want to skip the execution and jump to the section on large scale models.
[12]:
%%time
%%capture --no-display
n_starts = 10
# Due to run time we already use parallelization.
# This will be introduced in more detail later.
engine = pypesto.engine.MultiProcessEngine()
# Scipy: L-BFGS-B
result_lbfgsb = optimize.minimize(
problem=problem,
optimizer=optimizer_scipy_lbfgsb,
engine=engine,
n_starts=n_starts,
)
# Scipy: Powell
result_powell = optimize.minimize(
problem=problem,
optimizer=optimizer_scipy_powell,
engine=engine,
n_starts=n_starts,
)
# Fides
result_fides = optimize.minimize(
problem=problem,
optimizer=optimizer_fides,
engine=engine,
n_starts=n_starts,
)
# PySwarm
result_pyswarm = optimize.minimize(
problem=problem,
optimizer=optimizer_pyswarm,
engine=engine,
n_starts=1, # Global optimizers are usually run once. The number of particles (par_popsize) is usually the parameter that is adapted.
)
Engine will use up to 2 processes (= CPU count).
start 0 failed:
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/pypesto/envs/stable/lib/python3.11/site-packages/pypesto/optimize/optimizer.py", line 122, in wrapped_minimize
result = minimize(
^^^^^^^^^
File "/home/docs/checkouts/readthedocs.org/user_builds/pypesto/envs/stable/lib/python3.11/site-packages/pypesto/optimize/optimizer.py", line 251, in wrapped_minimize
return minimize(
^^^^^^^^^
File "/home/docs/checkouts/readthedocs.org/user_builds/pypesto/envs/stable/lib/python3.11/site-packages/pypesto/optimize/optimizer.py", line 974, in minimize
raise OptimizerImportError("pyswarms") from None
pypesto.optimize.optimizer.OptimizerImportError: Optimizer "pyswarms" not available, install corresponding package e.g. via "pip install pypesto[pyswarms]"
CPU times: user 169 ms, sys: 84.4 ms, total: 254 ms
Wall time: 3.22 s
Optimizer Convergence
A common visualization of optimizer convergence are waterfall plots. Waterfall plots show the (ordered) results of the individual optimization runs. In general, we hope to obtain clearly visible plateaus, as they indicate optimizer convergence to local minima.
[13]:
optimizer_results = [
result_lbfgsb,
result_powell,
result_fides
]
optimizer_names = ["Scipy: L-BFGS-B", "Scipy: Powell", "Fides"]
pypesto.visualize.waterfall(optimizer_results, legends=optimizer_names);
Optimizer run time
Optimizer run time vastly differs among the different optimizers, as can be seen below:
[14]:
print("Average Run time per start:")
print("-------------------")
for optimizer_name, optimizer_result in zip(
optimizer_names, optimizer_results
):
t = np.sum(optimizer_result.optimize_result.get_for_key("time")) / n_starts
print(f"{optimizer_name}: {t:f} s")
Average Run time per start:
-------------------
Scipy: L-BFGS-B: 0.158370 s
Scipy: Powell: 0.164333 s
Fides: 0.081512 s
3. Fitting of large scale models
When fitting large scale models (i.e. with >100 parameters and accordingly also more data), two important issues are efficient gradient computation and parallelization.
Efficient gradient computation
As seen in the example above and as can be confirmed from own experience: If fast and reliable gradients can be provided, gradient-based optimizers are favourable with respect to optimizer convergence and run time.
It has been shown that adjoint sensitivity analysis is a fast and reliable method to compute gradients for large scale models, since their run time is (asymptotically) independent of the number of parameters (Fröhlich et al. PlosCB 2017).
(Figure from Fröhlich et al. PlosCB 2017) Adjoint sensitivities are implemented in AMICI.
[15]:
# Set gradient computation method to adjoint
problem.objective.amici_solver.setSensitivityMethod(
amici.SensitivityMethod.adjoint
)
Parallelization
Multi-start optimization can easily be parallelized by using engines
.
[16]:
%%time
%%capture
# Parallelize
engine = pypesto.engine.MultiProcessEngine()
# Optimize
result = optimize.minimize(
problem=problem,
optimizer=optimizer_scipy_lbfgsb,
engine=engine,
n_starts=25,
)
Engine will use up to 2 processes (= CPU count).
CPU times: user 171 ms, sys: 16.7 ms, total: 188 ms
Wall time: 7.01 s
4. Uncertainty quantification
PyPESTO focuses on two approaches to assess parameter uncertainties:
Profile likelihoods
Sampling
Profile Likelihoods
Profile likelihoods compute confidence intervals via a likelihood ratio test. Profile likelihoods perform a maximum-projection of the likelihood function on the parameter of interest. The likelihood ratio test then gives a cut-off criterion via the \(\chi^2_1\) distribution.
In pyPESTO, the maximum projection is solved as a maximization problem and can be obtained via
[17]:
%%time
%%capture
import pypesto.profile as profile
result = profile.parameter_profile(
problem=problem,
result=result,
optimizer=optimizer_scipy_lbfgsb,
profile_index=[0, 1],
)
Next guess for k1 in direction -1 is -0.2652. Step size: -0.0110.
Optimization successful for k1=-0.2652. Start fval -25.252136, end fval -25.339994.
Next guess for k1 in direction -1 is -0.2677. Step size: -0.0025.
Optimization successful for k1=-0.2677. Start fval -25.331897, end fval -25.331898.
Next guess for k1 in direction -1 is -0.2707. Step size: -0.0030.
Optimization successful for k1=-0.2707. Start fval -25.319754, end fval -25.319754.
Next guess for k1 in direction -1 is -0.2744. Step size: -0.0037.
Optimization successful for k1=-0.2744. Start fval -25.301538, end fval -25.301539.
Next guess for k1 in direction -1 is -0.2790. Step size: -0.0045.
Optimization successful for k1=-0.2790. Start fval -25.274217, end fval -25.274221.
Next guess for k1 in direction -1 is -0.2845. Step size: -0.0055.
Optimization successful for k1=-0.2845. Start fval -25.233248, end fval -25.233249.
Next guess for k1 in direction -1 is -0.2913. Step size: -0.0068.
Optimization successful for k1=-0.2913. Start fval -25.171795, end fval -25.171798.
Next guess for k1 in direction -1 is -0.2996. Step size: -0.0083.
Optimization successful for k1=-0.2996. Start fval -25.079635, end fval -25.079638.
Next guess for k1 in direction -1 is -0.3097. Step size: -0.0101.
Optimization successful for k1=-0.3097. Start fval -24.941415, end fval -24.941421.
Next guess for k1 in direction -1 is -0.3221. Step size: -0.0124.
Optimization successful for k1=-0.3221. Start fval -24.734072, end fval -24.734086.
Next guess for k1 in direction -1 is -0.3371. Step size: -0.0151.
Optimization successful for k1=-0.3371. Start fval -24.423224, end fval -24.423251.
Next guess for k1 in direction -1 is -0.3555. Step size: -0.0184.
Optimization successful for k1=-0.3555. Start fval -23.957222, end fval -23.957278.
Next guess for k1 in direction -1 is -0.3780. Step size: -0.0224.
Optimization successful for k1=-0.3780. Start fval -23.258651, end fval -23.258819.
Next guess for k1 in direction 1 is -0.2229. Step size: 0.0313.
Optimization successful for k1=-0.2229. Start fval -25.227230, end fval -25.227443.
Next guess for k1 in direction 1 is -0.2158. Step size: 0.0071.
Optimization successful for k1=-0.2158. Start fval -25.163144, end fval -25.163152.
Next guess for k1 in direction 1 is -0.2071. Step size: 0.0087.
Optimization successful for k1=-0.2071. Start fval -25.066656, end fval -25.066660.
Next guess for k1 in direction 1 is -0.1964. Step size: 0.0107.
Optimization successful for k1=-0.1964. Start fval -24.921936, end fval -24.921940.
Next guess for k1 in direction 1 is -0.1833. Step size: 0.0131.
Optimization successful for k1=-0.1833. Start fval -24.704998, end fval -24.705008.
Next guess for k1 in direction 1 is -0.1672. Step size: 0.0161.
Optimization successful for k1=-0.1672. Start fval -24.379620, end fval -24.379638.
Next guess for k1 in direction 1 is -0.1473. Step size: 0.0199.
Optimization successful for k1=-0.1473. Start fval -23.891543, end fval -23.891589.
Next guess for k1 in direction 1 is -0.1228. Step size: 0.0245.
Optimization successful for k1=-0.1228. Start fval -23.159370, end fval -23.159465.
Next guess for k2 in direction -1 is -0.6282. Step size: -0.0198.
Optimization successful for k2=-0.6282. Start fval -25.251206, end fval -25.339903.
Next guess for k2 in direction -1 is -0.6326. Step size: -0.0045.
Optimization successful for k2=-0.6326. Start fval -25.331764, end fval -25.331764.
Next guess for k2 in direction -1 is -0.6381. Step size: -0.0055.
Optimization successful for k2=-0.6381. Start fval -25.319548, end fval -25.319550.
Next guess for k2 in direction -1 is -0.6449. Step size: -0.0067.
Optimization successful for k2=-0.6449. Start fval -25.301231, end fval -25.301233.
Next guess for k2 in direction -1 is -0.6532. Step size: -0.0083.
Optimization successful for k2=-0.6532. Start fval -25.273757, end fval -25.273761.
Next guess for k2 in direction -1 is -0.6633. Step size: -0.0102.
Optimization successful for k2=-0.6633. Start fval -25.232550, end fval -25.232552.
Next guess for k2 in direction -1 is -0.6759. Step size: -0.0125.
Optimization successful for k2=-0.6759. Start fval -25.170731, end fval -25.170733.
Next guess for k2 in direction -1 is -0.6913. Step size: -0.0154.
Optimization successful for k2=-0.6913. Start fval -25.078022, end fval -25.078026.
Next guess for k2 in direction -1 is -0.7103. Step size: -0.0190.
Optimization successful for k2=-0.7103. Start fval -24.938996, end fval -24.939007.
Next guess for k2 in direction -1 is -0.7339. Step size: -0.0235.
Optimization successful for k2=-0.7339. Start fval -24.730535, end fval -24.730553.
Next guess for k2 in direction -1 is -0.7630. Step size: -0.0292.
Optimization successful for k2=-0.7630. Start fval -24.417949, end fval -24.417997.
Next guess for k2 in direction -1 is -0.7993. Step size: -0.0363.
Optimization successful for k2=-0.7993. Start fval -23.949240, end fval -23.949350.
Next guess for k2 in direction -1 is -0.8446. Step size: -0.0453.
Optimization successful for k2=-0.8446. Start fval -23.246432, end fval -23.246668.
Next guess for k2 in direction 1 is -0.5535. Step size: 0.0548.
Optimization successful for k2=-0.5535. Start fval -25.227917, end fval -25.228222.
Next guess for k2 in direction 1 is -0.5414. Step size: 0.0122.
Optimization successful for k2=-0.5414. Start fval -25.164263, end fval -25.164278.
Next guess for k2 in direction 1 is -0.5265. Step size: 0.0148.
Optimization successful for k2=-0.5265. Start fval -25.068396, end fval -25.068400.
Next guess for k2 in direction 1 is -0.5085. Step size: 0.0181.
Optimization successful for k2=-0.5085. Start fval -24.924663, end fval -24.924670.
Next guess for k2 in direction 1 is -0.4865. Step size: 0.0220.
Optimization successful for k2=-0.4865. Start fval -24.709183, end fval -24.709200.
Next guess for k2 in direction 1 is -0.4597. Step size: 0.0267.
Optimization successful for k2=-0.4597. Start fval -24.386118, end fval -24.386156.
Next guess for k2 in direction 1 is -0.4273. Step size: 0.0325.
Optimization successful for k2=-0.4273. Start fval -23.901676, end fval -23.901749.
Next guess for k2 in direction 1 is -0.3879. Step size: 0.0394.
Optimization successful for k2=-0.3879. Start fval -23.175054, end fval -23.175193.
CPU times: user 6.82 s, sys: 104 ms, total: 6.93 s
Wall time: 6.9 s
The maximum projections can now be inspected via:
[18]:
# adapt x_labels..
x_labels = [f"Log10({name})" for name in problem.x_names]
visualize.profiles(result, x_labels=x_labels, show_bounds=True);
The plot shows that seven parameters are identifiable, since the likelihood is tightly centered around the optimal parameter. Two parameters (k_exp_hetero
and k_imp_homo
) cannot be constrained by the data.
Furthermore pyPESTO allows to visualize confidence intervals directly via
[19]:
ax = pypesto.visualize.profile_cis(
result, confidence_level=0.95, show_bounds=True
)
ax.set_xlabel("Log10(Parameter value)");
Sampling
In pyPESTO, sampling from the posterior distribution can be performed as
[20]:
import pypesto.sample as sample
n_samples = 1000
sampler = sample.AdaptiveMetropolisSampler()
result = sample.sample(
problem, n_samples=n_samples, sampler=sampler, result=result
)
Elapsed time: 0.8194874929999987
Sampling results are stored in result.sample_result
and can be accessed e.g., via
[21]:
result.sample_result["trace_x"]
[21]:
array([[[-0.25416795, -0.60834125],
[-0.25416795, -0.60834125],
[-0.25416795, -0.60834125],
...,
[-0.26079462, -0.62055893],
[-0.26079462, -0.62055893],
[-0.26079462, -0.62055893]]])
Sampling Diagnostics
Geweke’s test assesses convergence of a sampling run and computes the burn-in of a sampling result. The effective sample size indicates the strength of the correlation between different samples.
[22]:
sample.geweke_test(result=result)
result.sample_result["burn_in"]
Geweke burn-in index: 0
[22]:
0
[23]:
sample.effective_sample_size(result=result)
result.sample_result["effective_sample_size"]
Estimated chain autocorrelation: 7.5261491121604935
Estimated effective sample size: 117.40352963946118
[23]:
117.40352963946118
Visualization of Sampling Results
[24]:
# scatter plots
visualize.sampling_scatter(result)
# marginals
visualize.sampling_1d_marginals(result);
Sampler Choice:
Similarly to parameter optimization, pyPESTO provides a unified interface to several sampler/sampling toolboxes, as well as own implementations of sampler:
Adaptive Metropolis:
sample.AdaptiveMetropolisSampler()
Adaptive parallel tempering:
sample.ParallelTemperingSampler()
Interface to
pymc3
viasample.Pymc3Sampler()
5. Storage
You can store and load the results of an analysis via the pypesto.store
module to a .hdf5
file.
Store result
[25]:
import tempfile
import pypesto.store as store
# create a temporary file, for demonstration purpose
f_tmp = tempfile.NamedTemporaryFile(suffix=".hdf5", delete=False)
result_file_name = f_tmp.name
# store the result
store.write_result(result, result_file_name)
f_tmp.close()
Load result file
You can re-load a result, e.g. for visualizations:
[26]:
# read result
result_loaded = store.read_result(result_file_name)
# e.g. do some visualisation
visualize.waterfall(result_loaded);
Software Development Standards:
PyPESTO is developed with the following standards:
Open source, code on GitHub.
Pip installable via:
pip install pypesto
.Documentation as RTD and example jupyter notebooks are available.
Has continuous integration & extensive automated testing.
Code reviews before merging into the develop/main branch.
Currently, 5–10 people are using, extending and (most importantly) maintaining pyPESTO in their “daily business”.
Further topics
Further features are available, among them:
Model Selection
Hierarchical Optimization of scaling/noise parameters
Categorical Data