Optimization with Synthetic Data

In this notebook, optimization is performed with an SBML model and PEtab problem, which includes some measurements.

Next, optimization is performed with synthetic data as measurements, which is generated using PEtab and AMICI. The ability to recover the parameter vector that was used to generate the synthetic data is demonstrated.


Additional requirements for this notebook can be installed with pip install amici petab.

# install if not done yet
# !apt install libatlas-base-dev swig
# %pip install pypesto[amici,petab] --quiet

1. Load required packages. PEtab provides a base class that is designed to be easily extended to support simulation with different tools. Here, the AMICI implementation of this base class is used.

import amici.petab_simulate
import petab

import pypesto.optimize
import pypesto.petab
import pypesto.visualize

# Helper function to get the maximum likelihood estimate as a dictionary from a pyPESTO optimization result.
def get_x_mle(optimize_result, pypesto_problem, petab_problem, scaled=True):
    if not scaled:
        scaling = petab.parameters.get_optimization_parameter_scaling(
    return {
        x_id: (
            petab.parameters.unscale(x_value, scaling[x_id])
            if not scaled
            else x_value
        for x_id, x_value in zip(
            pypesto_problem.x_names, optimize_result[0]["x"]
        # if x_id in scaling

Standard Optimization

The PEtab problem is used to generate a pyPESTO problem, which is used to estimate model parameters.

2. Load a PEtab problem. The synthetic data returned by the PEtab-derived synthetic data generator (later, an instance of amici.petab_simulate.PetabSimulator) will be equivalent to replacing the measurements in the PEtab problem’s measurements table with simulated values.

petab_yaml_filename = "conversion_reaction/conversion_reaction.yaml"
petab_problem_original = petab.Problem.from_yaml(petab_yaml_filename)

3. Create a pyPESTO problem from the PEtab problem. Here, the original PEtab problem is used for parameter estimation (no synthetic data is generated).

pypesto_importer_original = pypesto.petab.PetabImporter(petab_problem_original)
pypesto_problem_original = pypesto_importer_original.create_problem(
Visualization table not available. Skipping.

4. Estimate parameters. Multi-start local optimization with 100 starts is used, with the default pyPESTO optimizer.

pypesto_result_original = pypesto.optimize.minimize(
    pypesto_problem_original, n_starts=20

5. Visualize parameter estimation. Here, estimated values for k1 and k2 are shown, then a waterfall plot to indicate optimization quality, then a plot of the estimated parameters from the different starts to indicate identifiability.

Here, parameter estimation appears to have been successful. In the case of problematic parameter estimation, synthetic data can be used to determine whether parameter estimation can be used to identify known parameter values.

x_mle_unscaled_original = get_x_mle(
print("Parameters are estimated to be (linear scale):")
            f"{x_id}: {x_value}"
            for x_id, x_value in x_mle_unscaled_original.items()

Parameters are estimated to be (linear scale):
k1: 0.7755615897632466
k2: 0.5442529706737183

Synthetic Optimization

Synthetic optimization is similar to a standard optimization, except the PEtab measurements table is replaced with synthetic data that is generated from specified parameters, with noise, and then used for optimization.

Here, parameters are specified with a dictionary that is used to update the original PEtab parameters table. An alternative is use a second PEtab YAML file that is identical to the original, except for the parameters table, which would now contain the parameter values to be used for synthetic data generation.


Noise is added to the simulated data according to the: - noise distribution in the PEtab observables table - noise formula in the PEtab observables table, which is used to calculate the scale of the noise distribution - noise parameters in the PEtab measurements table, which are substituted into the noise formula for measurement-specific noise distribution scales

6. As before, load a PEtab problem. This time, the parameters table is edited to contain parameters values that will be used for synthetic data generation (synthetic_parameters). Then, the simulator is used to generate synthetic data, which replaces the measurements table of the PEtab problem for parameter estimation in the next step.

Here, synthetic data also has noise added (noise=True), which is defined by the PEtab problem as described above. A noise scaling factor can also be specified (here, a small value - noise_scaling_factor=0.01 - is used, to reduce noise such that the synthetic parameters are more likely to be recovered with parameter estimation).

The simulator working directory is then deleted along with its contents.

petab_problem_synthetic = petab.Problem.from_yaml(petab_yaml_filename)

synthetic_parameters = {"k1": 1.5, "k2": 2.5}

simulator = amici.petab_simulate.PetabSimulator(petab_problem_synthetic)
# Optional: the AMICI simulator is provided a model, to avoid recompilation
petab_problem_synthetic.measurement_df = simulator.simulate(
/tmp/ipykernel_372826/2597958043.py:4: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


7. Create a pyPESTO problem from the edited PEtab problem, and estimate parameters.

pypesto_importer_synthetic = pypesto.petab.PetabImporter(
pypesto_problem_synthetic = pypesto_importer_synthetic.create_problem()
pypesto_result_synthetic = pypesto.optimize.minimize(
    pypesto_problem_synthetic, n_starts=20
Visualization table not available. Skipping.

8. Visualize parameter estimation. Here, the estimates for k1 and k2 are similar to the synthetic parameters, suggesting that parameter estimation works well with this PEtab problem and can be used to identify parameter values successfully (caveat: noise is reduced here; parameter estimation can be expected to perform worse with more realistically noisy data).

x_mle_unscaled_synthetic = get_x_mle(
print("Parameters are estimated to be (linear scale):")
            f"{x_id}: {x_value}"
            for x_id, x_value in x_mle_unscaled_synthetic.items()

Parameters are estimated to be (linear scale):
k1: 1.5034772956797575
k2: 2.506518330574491