Optimizer Convergence and Comparison

This notebook is dedicated to additional insights into the optimization process, be it convergence properties or comparisons of two different results.

After this notebook you can…

  • Use visualization methods to compare optimizers in terms of performance

  • check convergence of a multistart optimization

  • get insights into the convergence history

  • restart an optimization from a history file

Imports and constants

[1]:
import pypesto.optimize as optimize
import pypesto.visualize as visualize
import pypesto
import logging
import tempfile
import numpy as np
import scipy as sp
import copy
from IPython.display import Markdown, display

np.random.seed(3142)

Creating optimization results

For the sake of this notebook, we create two optimization results done with different optimizers.

[2]:
%%capture
# create problem
objective = pypesto.Objective(
    fun=sp.optimize.rosen,
    grad=sp.optimize.rosen_der,
    hess=sp.optimize.rosen_hess,
)
dim_full = 15
lb = -5 * np.ones((dim_full, 1))
ub = 5 * np.ones((dim_full, 1))

problem = pypesto.Problem(
    objective=objective, lb=lb, ub=ub
)
[3]:
# optimizer
scipy_optimizer = optimize.ScipyOptimizer()
[4]:
# create temporary storagefile...
f_scipy = tempfile.NamedTemporaryFile(suffix=".hdf5", delete=False)
fn_scipy = f_scipy.name

# ... and corresponding history option
history_options_scipy = pypesto.HistoryOptions(
    trace_record=True, storage_file=fn_scipy
)

Note to the above:

In practice you should not use a temporary file, as it is removed after the run, while still creating overhead. There are two options you might choose from instead:

  • If you do not plan to save the optimization result, you can use a MemoryHistory by removing the storage_file-argument. This creates no overhead but is more demanding on the memory.

  • If you want to save your results (recommended) for any form of reusability, you can remove f_$optimizer and replace the fn_$optimizerassignment with fn_$optimizer = "filename_of_choice.hdf5"

[5]:
n_starts = 10

# run optimization
result_scipy = optimize.minimize(
    problem=problem,
    optimizer=scipy_optimizer,
    n_starts=n_starts,
    history_options=history_options_scipy,
    filename=fn_scipy,
)

In a first step we compare the optimizers in terms of final objective function and robustness through a waterfall plot.

Optimizer convergence

First we want to check convergence of a single result. For this a summary and general visualizations such as waterfall-plots can be helpfull, but also specific optimizer_convergence visualization as well as history tracing.

[6]:
display(Markdown(result_scipy.summary()))

Optimization Result

  • number of starts: 10

  • best value: 1.1155732675693096e-10, id=8

  • worst value: 3.98662381387951, id=9

  • number of non-finite values: 0

  • execution time summary:

    • Mean execution time: 1.005s

    • Maximum execution time: 1.557s, id=6

    • Minimum execution time: 0.646s, id=1

  • summary of optimizer messages:

    Count

    Message

    10

    CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

  • best value found (approximately) 9 time(s)

  • number of plateaus found: 1

A summary of the best run:

Optimizer Result

  • optimizer used: <ScipyOptimizer method=L-BFGS-B options={‘disp’: False, ‘maxfun’: 1000}>

  • message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

  • number of evaluations: 110

  • time taken to optimize: 0.976s

  • startpoint: [ 2.49775229 -4.93575586 -0.96507193 -4.90511172 -3.96909126 0.34152167 -2.05712605 3.32840348 -3.07422127 1.11162062 1.98350063 -2.75493257 2.90622105 -4.58360239 -4.29519499]

  • endpoint: [0.99999964 0.99999968 0.99999991 1.00000019 1.00000033 1.00000029 1.00000019 1.00000008 1.00000002 1.00000004 1.00000005 1.00000027 1.00000073 1.00000157 1.00000315]

  • final objective value: 1.1155732675693096e-10

  • final gradient value: [-2.63397252e-04 -2.50546409e-04 -1.04567957e-05 1.26001883e-04 2.46768531e-04 9.44915065e-05 6.63017778e-05 -8.47722933e-05 -1.89699575e-04 2.35789886e-04 -2.80974512e-04 6.87135979e-05 2.26009811e-05 -1.98121243e-05 1.58263112e-05]

[7]:
# waterfall plot
visualize.waterfall(result_scipy);
../_images/example_history_usage_16_0.png

The waterfall plot is an overview of the final objective function values. They are ordered from best to worst. Similar colors indicate similar function values and potential local optima/mannifolds. In the best case scenario all values are assigned to a plateau indicating local optima and the best value is found more then once. Additionally we might want to check whether the gradients converged as well and whether we can find a pattern in specific reasons to stop:

[8]:
visualize.optimizer_convergence(result_scipy);
../_images/example_history_usage_18_0.png

We usually want the gradients to be very low, in order to actually ensure we are in a local optimum. If the results do not seem entirely promising, we might want to switch optimizers altogether, as different optimizers sometimes perform better for other problems. Additionally one can try changing the optimizer.

[9]:
# switch to fides optimizer
fides_optimizer = optimize.FidesOptimizer(
    verbose=logging.WARN
)
f_fides = tempfile.NamedTemporaryFile(suffix=".hdf5", delete=False)
fn_fides = f_fides.name
history_options_fides = pypesto.HistoryOptions(
    trace_record=True, storage_file=fn_fides
)
result_fides = optimize.minimize(
    problem=problem,
    optimizer=fides_optimizer,
    n_starts=n_starts,
    history_options=history_options_fides,
    filename=fn_fides,
)
[10]:
visualize.waterfall([result_fides, result_scipy], legends=["Fides Optimizer", "Scipy Optimizer"]);
../_images/example_history_usage_21_0.png

We can also compare various metrics of the results, such as time and number of evaluations

[11]:
visualize.optimization_run_properties_per_multistart(
        [result_fides, result_scipy], properties_to_plot=['time', 'n_fval'],
        legends=["Fides Optimizer", "Scipy Optimizer"]
);
../_images/example_history_usage_23_0.png

We might want to check how close the estimated guesses are together, for this we can employ the parameter visualization.

[12]:
visualize.parameters(result_fides);
../_images/example_history_usage_25_0.png

We can also check how the optimization trajectory looks like during the different runs, getting other reasons such as very flat landscapes, that can be additonal reasons for the optimization to stop. For this the we use the history:

[13]:
visualize.optimizer_history(result_fides, trace_y="fval")
visualize.optimizer_history(result_fides, trace_y="gradnorm");
Reference point is currently only implemented for trace_y == fval and will not be plotted for trace_y == gradnorm.
../_images/example_history_usage_27_1.png
../_images/example_history_usage_27_2.png

As we can see the function values are not monotonic. This is due to the optimization tracing line search evaluations as well. This allows us to investigate potential problems. Recurring patterns in the gradient norm together with miniscule to no changes in the function values indicate the optimizer to not be able to really find another next point or taking spiraling steps. In both cases the actual optimum is very hard to pinpoint. Lowering tolerances, increasing startpoints (up to a certain point), switching optimizers are all valid strategies in trying to overcome such issues. However, there is no recipe for all models and thus it is always important to investigate the optimization in terms of convergence, termination reasons and function evaluations, to get ideas on what to do next.

Reloading from History

Especially when running large models on clusters, the optimization sometimes may stop due to unfortunate reasons (e.g., timeouts). In these cases, the history serves yet another purpose: retrieving finished and unfinished optimizations. Sometimes out of 100 starts, 80 might have already been terminated. Investigating those 80 might already yield good results. In other cases, we might want to continue optimization from where we left off.

Loading Results from History

To load the results from history, we use the optimization_result_from_history function from the optimize module. This function allows us to retrieve the state of the optimization process stored in one HDF5 file.

[14]:
# Load result from history
result_from_history = optimize.optimization_result_from_history(problem=problem, filename=fn_fides)
result_from_history.problem = problem

Verifying the Loaded Results

We can check that the visualization of the loaded results matches the original results. This ensures that the history has been loaded correctly.

[15]:
# Visualize the optimization history
visualize.optimizer_history(result_from_history, trace_y="fval")
visualize.parameters(
    [result_fides, result_from_history],
    colors=[[0.8, 0.2, 0.2, 0.5], [0.2, 0.2, 0.8, 0.2]]
);
../_images/example_history_usage_32_0.png
../_images/example_history_usage_32_1.png

We compare the function value trace of the loaded results with the original results to ensure consistency.

[16]:
result_from_history.optimize_result[0].history.get_fval_trace() == result_fides.optimize_result[0].history.get_fval_trace()
[16]:
True

Continuing Optimization

It’s important to note that we did not use the result saved in HDF5 to fill in the loaded result, but solely the history. This means that if you use an Hdf5History during optimization and the optimization is interrupted, you can still load the optimization up to that point. From there, you can, for example, restart the unfinished (e.g., the last 3) starts and finish optimizing them:

[17]:
# we set x_guesses within the problem, to tell the optimization the startpoints of optimization
continued_problem = copy.deepcopy(problem)
continued_problem.set_x_guesses(result_from_history.optimize_result.x[-3:])

continued_result = optimize.minimize(
    problem=continued_problem,
    optimizer=fides_optimizer,
    n_starts=3,
)

Notes on Hdf5History with MultiProcessEngine

If you use both Hdf5History and MultiProcessEngine together, the history works as follows:

  • You specify a history file fn_history to store the results in.

  • A folder with the name fn_history is created at the start of optimization.

  • Each optimization gets its own separate HDF5 file within the folder named fn_history_{id}.

  • After optimization, the fn_history file is created, linking each result within this file. This allows you to work easily with the fn_history file, which pools together all other histories.

However, if your optimization is interrupted, there is no fn_history file for pooling. In this case, the optimization_result_from_history does not work nicely for all of them together. Instead, you can load each result separately through the optimization_result_from_history and append the optimize_results together.

A Pseudo-Code for this would look like this:

import os
from optimize import optimization_result_from_history
from pypesto import Result

# The history directory is a folder containing all individual history files, in the same place as the fn_history file would have been
history_dir = "fn_history"

# Get a list of all individual history files
history_files = [os.path.join(history_dir, f) for f in os.listdir(history_dir) if f.endswith('.h5')]

# Each file will have exactly one optimization result
result = Result(problem=problem)
result.optimize_result.list = [optimization_result_from_history(problem=problem, filename=history_file).optimize_result[0] for history_file in history_files]
result.optimize_result.sort()