Parameter estimation using censored data

This Notebook explains the use of censored data for parameter estimation of ODE models. An example model is provided in pypesto/doc/example/example_censored. The implementation supports all three censoring types of measurements:

Left censored: a datapoint is below a certain known value.
Right censored: a datapoint is above a certain known value.
Interval censored: a datapoint is on an interval between two known values.

In all three cases, the exact numerical value of the datapoint is unknown. For the integration of censored measurements, we employ the optimal scaling approach. In this approach, each datapoint is represented by a variable surrogate datapoint which is constrained to be in its respective category. Categories can be thought of as intervals with specific bounds.

For censored data, the category bounds are already known from the censoring value:

for left censored data the category bounds are (0, censoring value),
for right censored data the category bounds are (censoring value, infinity),
for interval censored data the category bounds are given by the interval bounds.

This makes the identification of surrogate data extremely simple, as it can be directly (analytically) calculated.

Details on the optimal scaling approach can be found in Shepard, 1962 (https://doi.org/10.1007/BF02289621). Details on the application of the gradient-based optimal scaling approach to mechanistic modeling with ordinal data can be found in Schmiester et al. 2020 (https://doi.org/10.1007/s00285-020-01522-w) and Schmiester et al. 2021 (https://doi.org/10.1093/bioinformatics/btab512).

Import model from the petab_problem

[1]:

import matplotlib.pyplot as plt
import numpy as np
import petab

import pypesto
import pypesto.logging
import pypesto.optimize as optimize
from pypesto.petab import PetabImporter
from pypesto.visualize import plot_categories_from_pypesto_result

To use censored data for parameter estimation, in pyPESTO we use the optimal scaling approach. Since the optimal scaling approach is implemented in the hierarchical manner, it requires us to specify hierarchical=True when importing the petab_problem:

[2]:

petab_folder = "./example_censored/"
yaml_file = "example_censored.yaml"

petab_problem = petab.Problem.from_yaml(petab_folder + yaml_file)

importer = PetabImporter(petab_problem, hierarchical=True)

Visualization table not available. Skipping.

The petab_problem has to be specified in the usual PEtab formulation. The censored measurements have to be specified in the measurement.tsv file by adding the censoring type in the measurementType column, where a censoring type can be:

left-censored,
right-censored,
or interval-censored.

If the censoring type is not specified, the measurement will be considered as quantitative. Then, the censoring bound has to be specified in the censoringBounds column. For interval censored measurements the bounds should be separated with a semicolon:

[3]:

from pandas import option_context

with option_context("display.max_colwidth", 400):
    display(petab_problem.measurement_df)

	observableId	preequilibrationConditionId	simulationConditionId	measurement	time	observableParameters	noiseParameters	observableTransformation	noiseDistribution	measurementType	censoringBounds
0	Activity	NaN	Inhibitor_0	0.000000	5	NaN	1	lin	normal	interval-censored	10.0;16.0
1	Activity	NaN	Inhibitor_3	0.000000	5	NaN	1	lin	normal	interval-censored	10.0;16.0
2	Activity	NaN	Inhibitor_10	17.892654	5	NaN	1	lin	normal	NaN	NaN
3	Activity	NaN	Inhibitor_25	0.000000	5	NaN	1	lin	normal	right-censored	20.0
4	Activity	NaN	Inhibitor_35	16.812104	5	NaN	1	lin	normal	NaN	NaN
5	Activity	NaN	Inhibitor_50	9.173129	5	NaN	1	lin	normal	NaN	NaN
6	Activity	NaN	Inhibitor_75	4.150928	5	NaN	1	lin	normal	NaN	NaN
7	Activity	NaN	Inhibitor_100	0.000000	5	NaN	1	lin	normal	left-censored	3.0
8	Activity	NaN	Inhibitor_300	0.000000	5	NaN	1	lin	normal	left-censored	3.0
9	Ybar	NaN	Inhibitor_0	0.000000	5	NaN	1	lin	normal	NaN	NaN
10	Ybar	NaN	Inhibitor_3	0.059999	5	NaN	1	lin	normal	NaN	NaN
11	Ybar	NaN	Inhibitor_10	0.199994	5	NaN	1	lin	normal	NaN	NaN
12	Ybar	NaN	Inhibitor_25	0.499043	5	NaN	1	lin	normal	NaN	NaN
13	Ybar	NaN	Inhibitor_35	0.659169	5	NaN	1	lin	normal	NaN	NaN
14	Ybar	NaN	Inhibitor_50	0.814954	5	NaN	1	lin	normal	NaN	NaN
15	Ybar	NaN	Inhibitor_75	0.916383	5	NaN	1	lin	normal	NaN	NaN
16	Ybar	NaN	Inhibitor_100	0.948981	5	NaN	1	lin	normal	NaN	NaN
17	Ybar	NaN	Inhibitor_300	0.988130	5	NaN	1	lin	normal	NaN	NaN

For censored measurements, the measurement column will be ignored. For the Ybar observable we didn’t specify a measurement type, so those will be used as quantitative.

Note on inclusion of additional data types:

It is possible to include observables with different types of data to the same petab_problem. Refer to the notebooks on using semiquantitative data, relative data and ordinal data for details on integration of other data types. If the measurementType column is left empty for all measurements of an observable, the observable will be treated as quantitative.

Construct the objective and pypesto problem

Now when we construct the objective, it will construct all objects of the optimal scaling inner optimization:

OrdinalInnerSolver
OrdinalCalculator
OrdinalProblem

As there are no censored data specific inner options, we will pass none to the constructor.

[4]:

model = importer.create_model(verbose=False)
objective = importer.create_objective(model=model)

Compiling amici model to folder /home/docs/checkouts/readthedocs.org/user_builds/pypesto/checkouts/stable/doc/example/amici_models/0.23.1/Raf_Mitra_NatCom2018OptimalScaling_3CatQual.

Visualization table not available. Skipping.

Now let’s construct the pyPESTO problem and optimizer. We’re going to use a gradient-based optimizer for a faster optimization, but gradient-free optimizers can be used in the same way:

[5]:

problem = importer.create_problem(objective)

engine = pypesto.engine.MultiProcessEngine(n_procs=3)

optimizer = optimize.ScipyOptimizer(
    method="L-BFGS-B",
    options={"disp": None, "ftol": 2.220446049250313e-09, "gtol": 1e-5},
)
n_starts = 3
np.random.seed(n_starts)

Run optimization using optimal scaling approach

[6]:

np.random.seed(n_starts)

res = optimize.minimize(
    problem, n_starts=n_starts, optimizer=optimizer, engine=engine
)

Visualizing the result

[7]:

from pypesto.visualize import waterfall

waterfall(res)

[7]:

<Axes: title={'center': 'Waterfall plot'}, xlabel='Ordered optimizer run', ylabel='Objective value (offset=-1.554e+01)'>

../_images/example_censored_data_18_1.png

We can plot the censoring categories using the plot_categories_from_pypesto_result plotting function.

[8]:

plot_categories_from_pypesto_result(res, figsize=(15, 10))
plt.show()

../_images/example_censored_data_20_0.png