Parameter estimation using censored data

This Notebook explains the use of censored data for parameter estimation of ODE models. An example model is provided in pypesto/doc/example/example_censored. The implementation supports all three censoring types of measurements:

  • Left censored: a datapoint is below a certain known value.

  • Right censored: a datapoint is above a certain known value.

  • Interval censored: a datapoint is on an interval between two known values.

In all three cases, the exact numerical value of the datapoint is unknown. For the integration of censored measurements, we employ the optimal scaling approach. In this approach, each datapoint is represented by a variable surrogate datapoint which is constrained to be in its respective category. Categories can be thought of as intervals with specific bounds.

For censored data, the category bounds are already known from the censoring value:

  • for left censored data the category bounds are (0, censoring value),

  • for right censored data the category bounds are (censoring value, infinity),

  • for interval censored data the category bounds are given by the interval bounds.

This makes the identification of surrogate data extremely simple, as it can be directly (analytically) calculated.

Details on the optimal scaling approach can be found in Shepard, 1962 (https://doi.org/10.1007/BF02289621). Details on the application of the gradient-based optimal scaling approach to mechanistic modeling with ordinal data can be found in Schmiester et al. 2020 (https://doi.org/10.1007/s00285-020-01522-w) and Schmiester et al. 2021 (https://doi.org/10.1093/bioinformatics/btab512).

Import model from the petab_problem

[1]:
import matplotlib.pyplot as plt
import numpy as np
import petab

import pypesto
import pypesto.logging
import pypesto.optimize as optimize
from pypesto.petab import PetabImporter
from pypesto.visualize import plot_categories_from_pypesto_result

To use censored data for parameter estimation, in pyPESTO we use the optimal scaling approach. Since the optimal scaling approach is implemented in the hierarchical manner, it requires us to specify hierarchical=True when importing the petab_problem:

[2]:
petab_folder = "./example_censored/"
yaml_file = "example_censored.yaml"

petab_problem = petab.Problem.from_yaml(petab_folder + yaml_file)

importer = PetabImporter(petab_problem, hierarchical=True)
Visualization table not available. Skipping.

The petab_problem has to be specified in the usual PEtab formulation. The censored measurements have to be specified in the measurement.tsv file by adding the censoring type in the measurementType column, where a censoring type can be:

  • left-censored,

  • right-censored,

  • or interval-censored.

If the censoring type is not specified, the measurement will be considered as quantitative. Then, the censoring bound has to be specified in the censoringBounds column. For interval censored measurements the bounds should be separated with a semicolon:

[3]:
from pandas import option_context

with option_context("display.max_colwidth", 400):
    display(petab_problem.measurement_df)
observableId preequilibrationConditionId simulationConditionId measurement time observableParameters noiseParameters observableTransformation noiseDistribution measurementType censoringBounds
0 Activity NaN Inhibitor_0 0.000000 5 NaN 1 lin normal interval-censored 10.0;16.0
1 Activity NaN Inhibitor_3 0.000000 5 NaN 1 lin normal interval-censored 10.0;16.0
2 Activity NaN Inhibitor_10 17.892654 5 NaN 1 lin normal NaN NaN
3 Activity NaN Inhibitor_25 0.000000 5 NaN 1 lin normal right-censored 20.0
4 Activity NaN Inhibitor_35 16.812104 5 NaN 1 lin normal NaN NaN
5 Activity NaN Inhibitor_50 9.173129 5 NaN 1 lin normal NaN NaN
6 Activity NaN Inhibitor_75 4.150928 5 NaN 1 lin normal NaN NaN
7 Activity NaN Inhibitor_100 0.000000 5 NaN 1 lin normal left-censored 3.0
8 Activity NaN Inhibitor_300 0.000000 5 NaN 1 lin normal left-censored 3.0
9 Ybar NaN Inhibitor_0 0.000000 5 NaN 1 lin normal NaN NaN
10 Ybar NaN Inhibitor_3 0.059999 5 NaN 1 lin normal NaN NaN
11 Ybar NaN Inhibitor_10 0.199994 5 NaN 1 lin normal NaN NaN
12 Ybar NaN Inhibitor_25 0.499043 5 NaN 1 lin normal NaN NaN
13 Ybar NaN Inhibitor_35 0.659169 5 NaN 1 lin normal NaN NaN
14 Ybar NaN Inhibitor_50 0.814954 5 NaN 1 lin normal NaN NaN
15 Ybar NaN Inhibitor_75 0.916383 5 NaN 1 lin normal NaN NaN
16 Ybar NaN Inhibitor_100 0.948981 5 NaN 1 lin normal NaN NaN
17 Ybar NaN Inhibitor_300 0.988130 5 NaN 1 lin normal NaN NaN

For censored measurements, the measurement column will be ignored. For the Ybar observable we didn’t specify a measurement type, so those will be used as quantitative.

Note on inclusion of additional data types:

It is possible to include observables with different types of data to the same petab_problem. Refer to the notebooks on using semiquantitative data, relative data and ordinal data for details on integration of other data types. If the measurementType column is left empty for all measurements of an observable, the observable will be treated as quantitative.

Construct the objective and pypesto problem

Now when we construct the objective, it will construct all objects of the optimal scaling inner optimization:

  • OrdinalInnerSolver

  • OrdinalCalculator

  • OrdinalProblem

As there are no censored data specific inner options, we will pass none to the constructor.

[4]:
model = importer.create_model(verbose=False)
objective = importer.create_objective(model=model)
Compiling amici model to folder /home/docs/checkouts/readthedocs.org/user_builds/pypesto/checkouts/stable/doc/example/amici_models/0.23.1/Raf_Mitra_NatCom2018OptimalScaling_3CatQual.
Visualization table not available. Skipping.

Now let’s construct the pyPESTO problem and optimizer. We’re going to use a gradient-based optimizer for a faster optimization, but gradient-free optimizers can be used in the same way:

[5]:
problem = importer.create_problem(objective)

engine = pypesto.engine.MultiProcessEngine(n_procs=3)

optimizer = optimize.ScipyOptimizer(
    method="L-BFGS-B",
    options={"disp": None, "ftol": 2.220446049250313e-09, "gtol": 1e-5},
)
n_starts = 3
np.random.seed(n_starts)

Run optimization using optimal scaling approach

[6]:
np.random.seed(n_starts)

res = optimize.minimize(
    problem, n_starts=n_starts, optimizer=optimizer, engine=engine
)

Visualizing the result

[7]:
from pypesto.visualize import waterfall

waterfall(res)
[7]:
<Axes: title={'center': 'Waterfall plot'}, xlabel='Ordered optimizer run', ylabel='Objective value (offset=-1.554e+01)'>
../_images/example_censored_data_18_1.png

We can plot the censoring categories using the plot_categories_from_pypesto_result plotting function.

[8]:
plot_categories_from_pypesto_result(res, figsize=(15, 10))
plt.show()
../_images/example_censored_data_20_0.png