Parameter estimation using ordinal data

This Notebook explains the use of ordinal data for parameter estimation, as described in Schmiester et al. (2020) and Schmiester et al. (2021). An example model is provided in pypesto/doc/example/example_ordinal.

Ordinal data is defined as the data for which the mutual ordering of the measurements is known. The implementation allows for indistinguishable measurements, i.e. multiple measurements which cannot be ordinally distinguished from each other and are part of the same ordinal level.

For the integration of ordinal data, we employ the optimal scaling approach. In this approach, each ordinal datapoint is defined as being part of a category, where the mutual ordering of categories of the same observable is known. The category interval bounds are numerically optimized and quantitative surrogate measurements are calculated to represent the ordinal measurements. This constitutes the inner subproblem of the hierarchical optimization problem.

Details on the optimal scaling approach can be found in Shepard, 1962 (https://doi.org/10.1007/BF02289621). Details on the application of the gradient-based optimal scaling approach to mechanistic modeling with ordinal data can be found in Schmiester et al. 2020 (https://doi.org/10.1007/s00285-020-01522-w) and Schmiester et al. 2021 (https://doi.org/10.1093/bioinformatics/btab512).

Import model from the petab_problem

[1]:

import matplotlib.pyplot as plt
import numpy as np
import petab

import pypesto
import pypesto.logging
import pypesto.optimize as optimize
from pypesto.hierarchical.optimal_scaling import OptimalScalingInnerSolver
from pypesto.petab import PetabImporter
from pypesto.visualize import plot_categories_from_pypesto_result

To use ordinal data for parameter estimation, in pyPESTO we use the optimal scaling approach as in the referenced papers. Since the optimal scaling approach is implemented in the hierarchical manner, it requires us to specify hierarchical=True when importing the petab_problem:

[2]:

petab_folder = './example_ordinal/'
yaml_file = 'example_ordinal.yaml'

petab_problem = petab.Problem.from_yaml(petab_folder + yaml_file)

importer = PetabImporter(petab_problem, hierarchical=True)

The petab_problem has to be specified in the usual PEtab formulation. The ordinal measurements have to be specified in the measurement.tsv file by adding ordinal in the measurementType column, and the ordering of measurements has to be specified in the measurementCategory column by assigning each measurement to a category:

[3]:

from pandas import option_context

with option_context('display.max_colwidth', 400):
    display(petab_problem.measurement_df)

	observableId	preequilibrationConditionId	simulationConditionId	time	observableParameters	noiseParameters	observableTransformation	noiseDistribution	measurementType	measurementCategory
0	Activity	NaN	Inhibitor_0	5	NaN	1	lin	normal	ordinal	2
1	Activity	NaN	Inhibitor_3	5	NaN	1	lin	normal	ordinal	2
2	Activity	NaN	Inhibitor_10	5	NaN	1	lin	normal	ordinal	3
3	Activity	NaN	Inhibitor_25	5	NaN	1	lin	normal	ordinal	3
4	Activity	NaN	Inhibitor_35	5	NaN	1	lin	normal	ordinal	2
5	Activity	NaN	Inhibitor_50	5	NaN	1	lin	normal	ordinal	1
6	Activity	NaN	Inhibitor_75	5	NaN	1	lin	normal	ordinal	1
7	Activity	NaN	Inhibitor_100	5	NaN	1	lin	normal	ordinal	1
8	Activity	NaN	Inhibitor_300	5	NaN	1	lin	normal	ordinal	1
9	Ybar	NaN	Inhibitor_0	5	NaN	1	lin	normal	ordinal	1
10	Ybar	NaN	Inhibitor_3	5	NaN	1	lin	normal	ordinal	1
11	Ybar	NaN	Inhibitor_10	5	NaN	1	lin	normal	ordinal	1
12	Ybar	NaN	Inhibitor_25	5	NaN	1	lin	normal	ordinal	2
13	Ybar	NaN	Inhibitor_35	5	NaN	1	lin	normal	ordinal	2
14	Ybar	NaN	Inhibitor_50	5	NaN	1	lin	normal	ordinal	3
15	Ybar	NaN	Inhibitor_75	5	NaN	1	lin	normal	ordinal	3
16	Ybar	NaN	Inhibitor_100	5	NaN	1	lin	normal	ordinal	3
17	Ybar	NaN	Inhibitor_300	5	NaN	1	lin	normal	ordinal	3

For ordinal measurements, the measurement column will be ignored.

Measurements with a larger category number will be constrained to be higher in the ordering. Multiple measurements can be assigned to the same category. If this is done, these measurements will be treated as indistinguishable.

Note on inclusion of additional data types:

It is possible to include observables with different types of data to the same petab_problem. Refer to the notebooks on using nonlinear-monotone data and censored data for details on integration of other data types. If the measurementType column is left empty for all measurements of an observable, the observable will be treated as quantitative.

Construct the objective and pypesto problem

Different options can be used for the optimal scaling approach:

method: standard / reduced
reparameterized: True / False
interval_constraints: max / max-min
min_gap: Any float value

It is recommended to use the reduced method with reparameterization as it is the most efficient and robust choice.

When no options are provided, the default is the reduced and reparameterized formulation with max as interval constraint and min_gap=1e-16.

Now when we construct the objective, it will construct all objects of the optimal scaling inner optimization:

OptimalScalingInnerSolver
OptimalScalingAmiciCalculator
OptimalScalingProblem

Specifically, the OptimalScalingInnerSolver will be constructed with default settings.

[4]:

objective = importer.create_objective()

To give non-default options to the OptimalScalingInnerSolver and OptimalScalingProblem, one can pass them as arguments when constructing the objective:

[5]:

objective = importer.create_objective(
    inner_options={
        "method": 'reduced',
        "reparameterized": True,
        "interval_constraints": "max",
        "min_gap": 0.1,
    },
)

Alternatively, we could’ve even given them to the importer constructor, they would be passed all the way to the inner solver.

Now let’s construct the pyPESTO problem and optimizer. We’re going to use a gradint-based optimizer for a faster optimization, but gradient-free optimizers can be used in the same way:

[6]:

problem = importer.create_problem(objective)

engine = pypesto.engine.SingleCoreEngine()

optimizer = optimize.ScipyOptimizer(
    method="L-BFGS-B",
    options={"disp": None, "ftol": 2.220446049250313e-09, "gtol": 1e-5},
)
n_starts = 10
np.random.seed(n_starts)

Run optimization using optimal scaling approach

If we want to change options of the OptimalScalingInnerSolver a new instance of it with new options can be passed to the correct inner calculator. Since in our problem there is only one data type and so one inner problem, we pass it to the first inner calculator problem.objective.calculator.inner_calculators[0].

In case of multiple inner problems, one would have to check the instance of the inner calculator, find the OptimalScalingAmiciCalculator and pass the new inner solver to it.

Run optimization using the reduced and reparameterized approach

[7]:

np.random.seed(n_starts)
problem.objective.calculator.inner_calculators[
    0
].inner_solver = OptimalScalingInnerSolver(
    options={
        'method': 'reduced',
        'reparameterized': True,
        'interval_constraints': 'max',
        'min_gap': 1e-10,
    }
)

res_reduced_reparameterized = optimize.minimize(
    problem, n_starts=n_starts, optimizer=optimizer, engine=engine
)

100%|██████████| 10/10 [00:09<00:00,  1.06it/s]

Run optimization using the reduced non-reparameterized approach

[8]:

np.random.seed(n_starts)
problem.objective.calculator.inner_calculators[
    0
].inner_solver = OptimalScalingInnerSolver(
    options={
        'method': 'reduced',
        'reparameterized': False,
        'interval_constraints': 'max',
        'min_gap': 1e-10,
    }
)

res_reduced = optimize.minimize(
    problem, n_starts=n_starts, optimizer=optimizer, engine=engine
)

100%|██████████| 10/10 [00:09<00:00,  1.04it/s]

Run optimization using the standard approach

[9]:

np.random.seed(n_starts)
problem.objective.calculator.inner_calculators[
    0
].inner_solver = OptimalScalingInnerSolver(
    options={
        'method': 'standard',
        'reparameterized': False,
        'interval_constraints': 'max',
        'min_gap': 1e-10,
    }
)

res_standard = optimize.minimize(
    problem, n_starts=n_starts, optimizer=optimizer, engine=engine
)

100%|██████████| 10/10 [00:09<00:00,  1.04it/s]

Compare results

Reduced formulation leads to improved computation times

[10]:

time_standard = res_standard.optimize_result.get_for_key('time')
print(f"Mean computation time for standard approach: {np.mean(time_standard)}")

time_reduced = res_reduced.optimize_result.get_for_key('time')
print(f"Mean computation time for reduced approach: {np.mean(time_reduced)}")

time_reduced_reparameterized = (
    res_reduced_reparameterized.optimize_result.get_for_key('time')
)
print(
    f"Mean computation time for reduced reparameterized approach: {np.mean(time_reduced_reparameterized)}"
)

Mean computation time for standard approach: 0.9591359615325927
Mean computation time for reduced approach: 0.9587480783462524
Mean computation time for reduced reparameterized approach: 0.9444998025894165

All approaches yield the same objective function values

[11]:

from pypesto.visualize import waterfall

waterfall(
    [res_standard, res_reduced, res_reduced_reparameterized], order_by_id=True
)

[11]:

<AxesSubplot: title={'center': 'Waterfall plot'}, xlabel='Ordered optimizer run', ylabel='Objective value (offset=9.994e-01)'>

../_images/example_example_ordinal_30_1.png

To be sure the function values are the same we can look at them directly as well:

[12]:

for start_id in range(10):
    print(
        [
            [
                res.fval
                for res in result.optimize_result.as_list()
                if res.id == str(start_id)
            ]
            for result in [
                res_standard,
                res_reduced,
                res_reduced_reparameterized,
            ]
        ]
    )

[[0.0019888825676075647], [0.001988882370072136], [0.001988882910838056]]
[[0.0005749912862480574], [0.000574991288188709], [0.0005749912568270656]]
[[0.0019873258450258152], [0.00198732584677828], [0.001987325844380883]]
[[0.000575277818265601], [0.0005752778595524745], [0.0005752778031689591]]
[[0.001988829061937092], [0.0019888290614633627], [0.0019888291358973434]]
[[0.0005764328010595188], [0.0005764329969969911], [0.0005764329960499568]]
[[0.0005747398568712971], [0.0005747398648398072], [0.0005747398437695331]]
[[0.0005735607865200714], [0.0005737543524337683], [0.0005767971102241927]]
[[0.0005735594279397531], [0.0005735611260351738], [0.0005741377033402549]]
[[0.0005735572044450639], [0.000573961371293838], [0.0005736263264256104]]

For any results, we can plot the optimized categories using the plot_categories_from_pypesto_result method.

[13]:

plot_categories_from_pypesto_result(res_standard, figsize=(10, 10))
plt.show()

../_images/example_example_ordinal_34_0.png

In this plot we can see that all surrogate data is inside their respective categories. So the ordering given in the measurements is fully satisfied. This constitutes a perfect fit of the data in the ordinal sense.