Optimizer Convergence and Comparison
This notebook is dedicated to additional insights into the optimization process, be it convergence properties or comparisons of two different results.
After this notebook you can…
Use visualization methods to compare optimizers in terms of performance
check convergence of a multistart optimization
get insights into the convergence history
restart an optimization from a history file
Imports and constants
[1]:
import pypesto.optimize as optimize
import pypesto.visualize as visualize
import pypesto
import logging
import tempfile
import numpy as np
import scipy as sp
import copy
from IPython.display import Markdown, display
np.random.seed(3142)
Creating optimization results
For the sake of this notebook, we create two optimization results done with different optimizers.
[2]:
%%capture
# create problem
objective = pypesto.Objective(
fun=sp.optimize.rosen,
grad=sp.optimize.rosen_der,
hess=sp.optimize.rosen_hess,
)
dim_full = 15
lb = -5 * np.ones((dim_full, 1))
ub = 5 * np.ones((dim_full, 1))
problem = pypesto.Problem(
objective=objective, lb=lb, ub=ub
)
[3]:
# optimizer
scipy_optimizer = optimize.ScipyOptimizer()
[4]:
# create temporary storagefile...
f_scipy = tempfile.NamedTemporaryFile(suffix=".hdf5", delete=False)
fn_scipy = f_scipy.name
# ... and corresponding history option
history_options_scipy = pypesto.HistoryOptions(
trace_record=True, storage_file=fn_scipy
)
Note to the above:
In practice you should not use a temporary file, as it is removed after the run, while still creating overhead. There are two options you might choose from instead:
If you do not plan to save the optimization result, you can use a
MemoryHistory
by removing thestorage_file
-argument. This creates no overhead but is more demanding on the memory.If you want to save your results (recommended) for any form of reusability, you can remove
f_$optimizer
and replace thefn_$optimizer
assignment withfn_$optimizer = "filename_of_choice.hdf5"
[5]:
n_starts = 10
# run optimization
result_scipy = optimize.minimize(
problem=problem,
optimizer=scipy_optimizer,
n_starts=n_starts,
history_options=history_options_scipy,
filename=fn_scipy,
)
In a first step we compare the optimizers in terms of final objective function and robustness through a waterfall plot.
Optimizer convergence
First we want to check convergence of a single result. For this a summary and general visualizations such as waterfall-plots can be helpfull, but also specific optimizer_convergence visualization as well as history tracing.
[6]:
display(Markdown(result_scipy.summary()))
Optimization Result
number of starts: 10
best value: 1.1155732675693096e-10, id=8
worst value: 3.98662381387951, id=9
number of non-finite values: 0
execution time summary:
Mean execution time: 1.005s
Maximum execution time: 1.557s, id=6
Minimum execution time: 0.646s, id=1
summary of optimizer messages:
Count
Message
10
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
best value found (approximately) 9 time(s)
number of plateaus found: 1
A summary of the best run:
Optimizer Result
optimizer used: <ScipyOptimizer method=L-BFGS-B options={‘disp’: False, ‘maxfun’: 1000}>
message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
number of evaluations: 110
time taken to optimize: 0.976s
startpoint: [ 2.49775229 -4.93575586 -0.96507193 -4.90511172 -3.96909126 0.34152167 -2.05712605 3.32840348 -3.07422127 1.11162062 1.98350063 -2.75493257 2.90622105 -4.58360239 -4.29519499]
endpoint: [0.99999964 0.99999968 0.99999991 1.00000019 1.00000033 1.00000029 1.00000019 1.00000008 1.00000002 1.00000004 1.00000005 1.00000027 1.00000073 1.00000157 1.00000315]
final objective value: 1.1155732675693096e-10
final gradient value: [-2.63397252e-04 -2.50546409e-04 -1.04567957e-05 1.26001883e-04 2.46768531e-04 9.44915065e-05 6.63017778e-05 -8.47722933e-05 -1.89699575e-04 2.35789886e-04 -2.80974512e-04 6.87135979e-05 2.26009811e-05 -1.98121243e-05 1.58263112e-05]
[7]:
# waterfall plot
visualize.waterfall(result_scipy);
The waterfall plot is an overview of the final objective function values. They are ordered from best to worst. Similar colors indicate similar function values and potential local optima/mannifolds. In the best case scenario all values are assigned to a plateau indicating local optima and the best value is found more then once. Additionally we might want to check whether the gradients converged as well and whether we can find a pattern in specific reasons to stop:
[8]:
visualize.optimizer_convergence(result_scipy);
We usually want the gradients to be very low, in order to actually ensure we are in a local optimum. If the results do not seem entirely promising, we might want to switch optimizers altogether, as different optimizers sometimes perform better for other problems. Additionally one can try changing the optimizer.
[9]:
# switch to fides optimizer
fides_optimizer = optimize.FidesOptimizer(
verbose=logging.WARN
)
f_fides = tempfile.NamedTemporaryFile(suffix=".hdf5", delete=False)
fn_fides = f_fides.name
history_options_fides = pypesto.HistoryOptions(
trace_record=True, storage_file=fn_fides
)
result_fides = optimize.minimize(
problem=problem,
optimizer=fides_optimizer,
n_starts=n_starts,
history_options=history_options_fides,
filename=fn_fides,
)
[10]:
visualize.waterfall([result_fides, result_scipy], legends=["Fides Optimizer", "Scipy Optimizer"]);
We can also compare various metrics of the results, such as time and number of evaluations
[11]:
visualize.optimization_run_properties_per_multistart(
[result_fides, result_scipy], properties_to_plot=['time', 'n_fval'],
legends=["Fides Optimizer", "Scipy Optimizer"]
);
We might want to check how close the estimated guesses are together, for this we can employ the parameter visualization.
[12]:
visualize.parameters(result_fides);
We can also check how the optimization trajectory looks like during the different runs, getting other reasons such as very flat landscapes, that can be additonal reasons for the optimization to stop. For this the we use the history:
[13]:
visualize.optimizer_history(result_fides, trace_y="fval")
visualize.optimizer_history(result_fides, trace_y="gradnorm");
Reference point is currently only implemented for trace_y == fval and will not be plotted for trace_y == gradnorm.
As we can see the function values are not monotonic. This is due to the optimization tracing line search evaluations as well. This allows us to investigate potential problems. Recurring patterns in the gradient norm together with miniscule to no changes in the function values indicate the optimizer to not be able to really find another next point or taking spiraling steps. In both cases the actual optimum is very hard to pinpoint. Lowering tolerances, increasing startpoints (up to a certain point), switching optimizers are all valid strategies in trying to overcome such issues. However, there is no recipe for all models and thus it is always important to investigate the optimization in terms of convergence, termination reasons and function evaluations, to get ideas on what to do next.
Reloading from History
Especially when running large models on clusters, the optimization sometimes may stop due to unfortunate reasons (e.g., timeouts). In these cases, the history serves yet another purpose: retrieving finished and unfinished optimizations. Sometimes out of 100 starts, 80 might have already been terminated. Investigating those 80 might already yield good results. In other cases, we might want to continue optimization from where we left off.
Loading Results from History
To load the results from history, we use the optimization_result_from_history
function from the optimize
module. This function allows us to retrieve the state of the optimization process stored in one HDF5 file.
[14]:
# Load result from history
result_from_history = optimize.optimization_result_from_history(problem=problem, filename=fn_fides)
result_from_history.problem = problem
Verifying the Loaded Results
We can check that the visualization of the loaded results matches the original results. This ensures that the history has been loaded correctly.
[15]:
# Visualize the optimization history
visualize.optimizer_history(result_from_history, trace_y="fval")
visualize.parameters(
[result_fides, result_from_history],
colors=[[0.8, 0.2, 0.2, 0.5], [0.2, 0.2, 0.8, 0.2]]
);
We compare the function value trace of the loaded results with the original results to ensure consistency.
[16]:
result_from_history.optimize_result[0].history.get_fval_trace() == result_fides.optimize_result[0].history.get_fval_trace()
[16]:
True
Continuing Optimization
It’s important to note that we did not use the result saved in HDF5 to fill in the loaded result, but solely the history. This means that if you use an Hdf5History during optimization and the optimization is interrupted, you can still load the optimization up to that point. From there, you can, for example, restart the unfinished (e.g., the last 3) starts and finish optimizing them:
[17]:
# we set x_guesses within the problem, to tell the optimization the startpoints of optimization
continued_problem = copy.deepcopy(problem)
continued_problem.set_x_guesses(result_from_history.optimize_result.x[-3:])
continued_result = optimize.minimize(
problem=continued_problem,
optimizer=fides_optimizer,
n_starts=3,
)
Notes on Hdf5History with MultiProcessEngine
If you use both Hdf5History
and MultiProcessEngine
together, the history works as follows:
You specify a history file
fn_history
to store the results in.A folder with the name
fn_history
is created at the start of optimization.Each optimization gets its own separate HDF5 file within the folder named
fn_history_{id}
.After optimization, the
fn_history
file is created, linking each result within this file. This allows you to work easily with thefn_history
file, which pools together all other histories.
However, if your optimization is interrupted, there is no fn_history
file for pooling. In this case, the optimization_result_from_history
does not work nicely for all of them together. Instead, you can load each result separately through the optimization_result_from_history
and append the optimize_results
together.
A Pseudo-Code for this would look like this:
import os
from optimize import optimization_result_from_history
from pypesto import Result
# The history directory is a folder containing all individual history files, in the same place as the fn_history file would have been
history_dir = "fn_history"
# Get a list of all individual history files
history_files = [os.path.join(history_dir, f) for f in os.listdir(history_dir) if f.endswith('.h5')]
# Each file will have exactly one optimization result
result = Result(problem=problem)
result.optimize_result.list = [optimization_result_from_history(problem=problem, filename=history_file).optimize_result[0] for history_file in history_files]
result.optimize_result.sort()