blob: 1356e7475513b42e035e91cab42644ccdff318d3 [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001.. highlight:: c++
2
3.. default-domain:: cpp
4
5.. _chapter-gradient_problem_solver:
6
7==================================
8General Unconstrained Minimization
9==================================
10
11Modeling
12========
13
14:class:`FirstOrderFunction`
15---------------------------
16
17.. class:: FirstOrderFunction
18
19 Instances of :class:`FirstOrderFunction` implement the evaluation of
20 a function and its gradient.
21
22 .. code-block:: c++
23
24 class FirstOrderFunction {
25 public:
26 virtual ~FirstOrderFunction() {}
27 virtual bool Evaluate(const double* const parameters,
28 double* cost,
29 double* gradient) const = 0;
30 virtual int NumParameters() const = 0;
31 };
32
33.. function:: bool FirstOrderFunction::Evaluate(const double* const parameters, double* cost, double* gradient) const
34
35 Evaluate the cost/value of the function. If ``gradient`` is not
36 ``NULL`` then evaluate the gradient too. If evaluation is
37 successful return, ``true`` else return ``false``.
38
39 ``cost`` guaranteed to be never ``NULL``, ``gradient`` can be ``NULL``.
40
41.. function:: int FirstOrderFunction::NumParameters() const
42
43 Number of parameters in the domain of the function.
44
45
46:class:`GradientProblem`
47------------------------
48
49.. class:: GradientProblem
50
51.. code-block:: c++
52
53 class GradientProblem {
54 public:
55 explicit GradientProblem(FirstOrderFunction* function);
56 GradientProblem(FirstOrderFunction* function,
57 LocalParameterization* parameterization);
58 int NumParameters() const;
59 int NumLocalParameters() const;
60 bool Evaluate(const double* parameters, double* cost, double* gradient) const;
61 bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
62 };
63
64Instances of :class:`GradientProblem` represent general non-linear
65optimization problems that must be solved using just the value of the
66objective function and its gradient. Unlike the :class:`Problem`
67class, which can only be used to model non-linear least squares
68problems, instances of :class:`GradientProblem` not restricted in the
69form of the objective function.
70
71Structurally :class:`GradientProblem` is a composition of a
72:class:`FirstOrderFunction` and optionally a
73:class:`LocalParameterization`.
74
75The :class:`FirstOrderFunction` is responsible for evaluating the cost
76and gradient of the objective function.
77
78The :class:`LocalParameterization` is responsible for going back and
79forth between the ambient space and the local tangent space. When a
80:class:`LocalParameterization` is not provided, then the tangent space
81is assumed to coincide with the ambient Euclidean space that the
82gradient vector lives in.
83
84The constructor takes ownership of the :class:`FirstOrderFunction` and
85:class:`LocalParamterization` objects passed to it.
86
87
88.. function:: void Solve(const GradientProblemSolver::Options& options, const GradientProblem& problem, double* parameters, GradientProblemSolver::Summary* summary)
89
90 Solve the given :class:`GradientProblem` using the values in
91 ``parameters`` as the initial guess of the solution.
92
93
94Solving
95=======
96
97:class:`GradientProblemSolver::Options`
98---------------------------------------
99
100.. class:: GradientProblemSolver::Options
101
102 :class:`GradientProblemSolver::Options` controls the overall
103 behavior of the solver. We list the various settings and their
104 default values below.
105
106.. function:: bool GradientProblemSolver::Options::IsValid(string* error) const
107
108 Validate the values in the options struct and returns true on
109 success. If there is a problem, the method returns false with
110 ``error`` containing a textual description of the cause.
111
112.. member:: LineSearchDirectionType GradientProblemSolver::Options::line_search_direction_type
113
114 Default: ``LBFGS``
115
116 Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
117 ``BFGS`` and ``LBFGS``.
118
119.. member:: LineSearchType GradientProblemSolver::Options::line_search_type
120
121 Default: ``WOLFE``
122
123 Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
124 Note that in order for the assumptions underlying the ``BFGS`` and
125 ``LBFGS`` line search direction algorithms to be guaranteed to be
126 satisifed, the ``WOLFE`` line search should be used.
127
128.. member:: NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type
129
130 Default: ``FLETCHER_REEVES``
131
132 Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and
133 ``HESTENES_STIEFEL``.
134
135.. member:: int GradientProblemSolver::Options::max_lbfs_rank
136
137 Default: 20
138
139 The L-BFGS hessian approximation is a low rank approximation to the
140 inverse of the Hessian matrix. The rank of the approximation
141 determines (linearly) the space and time complexity of using the
142 approximation. Higher the rank, the better is the quality of the
143 approximation. The increase in quality is however is bounded for a
144 number of reasons.
145
146 1. The method only uses secant information and not actual
147 derivatives.
148
149 2. The Hessian approximation is constrained to be positive
150 definite.
151
152 So increasing this rank to a large number will cost time and space
153 complexity without the corresponding increase in solution
154 quality. There are no hard and fast rules for choosing the maximum
155 rank. The best choice usually requires some problem specific
156 experimentation.
157
158.. member:: bool GradientProblemSolver::Options::use_approximate_eigenvalue_bfgs_scaling
159
160 Default: ``false``
161
162 As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
163 step, the initial inverse Hessian approximation is taken to be the
164 Identity. However, [Oren]_ showed that using instead :math:`I *
165 \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
166 eigenvalue of the true inverse Hessian can result in improved
167 convergence in a wide variety of cases. Setting
168 ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
169 scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
170 iteration).
171
172 Precisely, approximate eigenvalue scaling equates to
173
174 .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
175
176 With:
177
178 .. math:: y_k = \nabla f_{k+1} - \nabla f_k
179 .. math:: s_k = x_{k+1} - x_k
180
181 Where :math:`f()` is the line search objective and :math:`x` the
182 vector of parameter values [NocedalWright]_.
183
184 It is important to note that approximate eigenvalue scaling does
185 **not** *always* improve convergence, and that it can in fact
186 *significantly* degrade performance for certain classes of problem,
187 which is why it is disabled by default. In particular it can
188 degrade performance when the sensitivity of the problem to different
189 parameters varies significantly, as in this case a single scalar
190 factor fails to capture this variation and detrimentally downscales
191 parts of the Jacobian approximation which correspond to
192 low-sensitivity parameters. It can also reduce the robustness of the
193 solution to errors in the Jacobians.
194
195.. member:: LineSearchIterpolationType GradientProblemSolver::Options::line_search_interpolation_type
196
197 Default: ``CUBIC``
198
199 Degree of the polynomial used to approximate the objective
200 function. Valid values are ``BISECTION``, ``QUADRATIC`` and
201 ``CUBIC``.
202
203.. member:: double GradientProblemSolver::Options::min_line_search_step_size
204
205 The line search terminates if:
206
207 .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
208
209 where :math:`\|\cdot\|_\infty` refers to the max norm, and
210 :math:`\Delta x_k` is the step change in the parameter values at
211 the :math:`k`-th iteration.
212
213.. member:: double GradientProblemSolver::Options::line_search_sufficient_function_decrease
214
215 Default: ``1e-4``
216
217 Solving the line search problem exactly is computationally
218 prohibitive. Fortunately, line search based optimization algorithms
219 can still guarantee convergence if instead of an exact solution,
220 the line search algorithm returns a solution which decreases the
221 value of the objective function sufficiently. More precisely, we
222 are looking for a step size s.t.
223
224 .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
225
226 This condition is known as the Armijo condition.
227
228.. member:: double GradientProblemSolver::Options::max_line_search_step_contraction
229
230 Default: ``1e-3``
231
232 In each iteration of the line search,
233
234 .. math:: \text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}
235
236 Note that by definition, for contraction:
237
238 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
239
240.. member:: double GradientProblemSolver::Options::min_line_search_step_contraction
241
242 Default: ``0.6``
243
244 In each iteration of the line search,
245
246 .. math:: \text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}
247
248 Note that by definition, for contraction:
249
250 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
251
252.. member:: int GradientProblemSolver::Options::max_num_line_search_step_size_iterations
253
254 Default: ``20``
255
256 Maximum number of trial step size iterations during each line
257 search, if a step size satisfying the search conditions cannot be
258 found within this number of trials, the line search will stop.
259
260 As this is an 'artificial' constraint (one imposed by the user, not
261 the underlying math), if ``WOLFE`` line search is being used, *and*
262 points satisfying the Armijo sufficient (function) decrease
263 condition have been found during the current search (in :math:`\leq`
264 ``max_num_line_search_step_size_iterations``). Then, the step size
265 with the lowest function value which satisfies the Armijo condition
266 will be returned as the new valid step, even though it does *not*
267 satisfy the strong Wolfe conditions. This behaviour protects
268 against early termination of the optimizer at a sub-optimal point.
269
270.. member:: int GradientProblemSolver::Options::max_num_line_search_direction_restarts
271
272 Default: ``5``
273
274 Maximum number of restarts of the line search direction algorithm
275 before terminating the optimization. Restarts of the line search
276 direction algorithm occur when the current algorithm fails to
277 produce a new descent direction. This typically indicates a
278 numerical failure, or a breakdown in the validity of the
279 approximations used.
280
281.. member:: double GradientProblemSolver::Options::line_search_sufficient_curvature_decrease
282
283 Default: ``0.9``
284
285 The strong Wolfe conditions consist of the Armijo sufficient
286 decrease condition, and an additional requirement that the
287 step size be chosen s.t. the *magnitude* ('strong' Wolfe
288 conditions) of the gradient along the search direction
289 decreases sufficiently. Precisely, this second condition
290 is that we seek a step size s.t.
291
292 .. math:: \|f'(\text{step_size})\| \leq \text{sufficient_curvature_decrease} * \|f'(0)\|
293
294 Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
295 of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
296
297.. member:: double GradientProblemSolver::Options::max_line_search_step_expansion
298
299 Default: ``10.0``
300
301 During the bracketing phase of a Wolfe line search, the step size
302 is increased until either a point satisfying the Wolfe conditions
303 is found, or an upper bound for a bracket containing a point
304 satisfying the conditions is found. Precisely, at each iteration
305 of the expansion:
306
307 .. math:: \text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}
308
309 By definition for expansion
310
311 .. math:: \text{max_step_expansion} > 1.0
312
313.. member:: int GradientProblemSolver::Options::max_num_iterations
314
315 Default: ``50``
316
317 Maximum number of iterations for which the solver should run.
318
319.. member:: double GradientProblemSolver::Options::max_solver_time_in_seconds
320
321 Default: ``1e6``
322 Maximum amount of time for which the solver should run.
323
324.. member:: double GradientProblemSolver::Options::function_tolerance
325
326 Default: ``1e-6``
327
328 Solver terminates if
329
330 .. math:: \frac{|\Delta \text{cost}|}{\text{cost}} \leq \text{function_tolerance}
331
332 where, :math:`\Delta \text{cost}` is the change in objective
333 function value (up or down) in the current iteration of the line search.
334
335.. member:: double GradientProblemSolver::Options::gradient_tolerance
336
337 Default: ``1e-10``
338
339 Solver terminates if
340
341 .. math:: \|x - \Pi \boxplus(x, -g(x))\|_\infty \leq \text{gradient_tolerance}
342
343 where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
344 is projection onto the bounds constraints and :math:`\boxplus` is
345 Plus operation for the overall local parameterization associated
346 with the parameter vector.
347
348.. member:: double GradientProblemSolver::Options::parameter_tolerance
349
350 Default: ``1e-8``
351
352 Solver terminates if
353
354 .. math:: \|\Delta x\| \leq (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
355
356 where :math:`\Delta x` is the step computed by the linear solver in
357 the current iteration of the line search.
358
359.. member:: LoggingType GradientProblemSolver::Options::logging_type
360
361 Default: ``PER_MINIMIZER_ITERATION``
362
363.. member:: bool GradientProblemSolver::Options::minimizer_progress_to_stdout
364
365 Default: ``false``
366
367 By default the :class:`Minimizer` progress is logged to ``STDERR``
368 depending on the ``vlog`` level. If this flag is set to true, and
369 :member:`GradientProblemSolver::Options::logging_type` is not
370 ``SILENT``, the logging output is sent to ``STDOUT``.
371
372 The progress display looks like
373
374 .. code-block:: bash
375
376 0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
377 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
378 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
379
380 Here
381
382 #. ``f`` is the value of the objective function.
383 #. ``d`` is the change in the value of the objective function if
384 the step computed in this iteration is accepted.
385 #. ``g`` is the max norm of the gradient.
386 #. ``h`` is the change in the parameter vector.
387 #. ``s`` is the optimal step length computed by the line search.
388 #. ``it`` is the time take by the current iteration.
389 #. ``tt`` is the total time taken by the minimizer.
390
391.. member:: vector<IterationCallback> GradientProblemSolver::Options::callbacks
392
393 Callbacks that are executed at the end of each iteration of the
394 :class:`Minimizer`. They are executed in the order that they are
395 specified in this vector. By default, parameter blocks are updated
396 only at the end of the optimization, i.e., when the
397 :class:`Minimizer` terminates. This behavior is controlled by
398 :member:`GradientProblemSolver::Options::update_state_every_variable`. If
399 the user wishes to have access to the update parameter blocks when
400 his/her callbacks are executed, then set
401 :member:`GradientProblemSolver::Options::update_state_every_iteration`
402 to true.
403
404 The solver does NOT take ownership of these pointers.
405
406
407.. member:: bool Solver::Options::update_state_every_iteration
408
409 Default: ``false``
410
411 Normally the parameter vector is only updated when the solver
412 terminates. Setting this to true updates it every iteration. This
413 setting is useful when building an interactive application using
414 Ceres and using an :class:`IterationCallback`.
415
416:class:`GradientProblemSolver::Summary`
417---------------------------------------
418
419.. class:: GradientProblemSolver::Summary
420
421 Summary of the various stages of the solver after termination.
422
423.. function:: string GradientProblemSolver::Summary::BriefReport() const
424
425 A brief one line description of the state of the solver after
426 termination.
427
428.. function:: string GradientProblemSolver::Summary::FullReport() const
429
430 A full multiline description of the state of the solver after
431 termination.
432
433.. function:: bool GradientProblemSolver::Summary::IsSolutionUsable() const
434
435 Whether the solution returned by the optimization algorithm can be
436 relied on to be numerically sane. This will be the case if
437 `GradientProblemSolver::Summary:termination_type` is set to `CONVERGENCE`,
438 `USER_SUCCESS` or `NO_CONVERGENCE`, i.e., either the solver
439 converged by meeting one of the convergence tolerances or because
440 the user indicated that it had converged or it ran to the maximum
441 number of iterations or time.
442
443.. member:: TerminationType GradientProblemSolver::Summary::termination_type
444
445 The cause of the minimizer terminating.
446
447.. member:: string GradientProblemSolver::Summary::message
448
449 Reason why the solver terminated.
450
451.. member:: double GradientProblemSolver::Summary::initial_cost
452
453 Cost of the problem (value of the objective function) before the
454 optimization.
455
456.. member:: double GradientProblemSolver::Summary::final_cost
457
458 Cost of the problem (value of the objective function) after the
459 optimization.
460
461.. member:: vector<IterationSummary> GradientProblemSolver::Summary::iterations
462
463 :class:`IterationSummary` for each minimizer iteration in order.
464
465.. member:: int num_cost_evaluations
466
467 Number of times the cost (and not the gradient) was evaluated.
468
469.. member:: int num_gradient_evaluations
470
471 Number of times the gradient (and the cost) were evaluated.
472
473.. member:: double GradientProblemSolver::Summary::total_time_in_seconds
474
475 Time (in seconds) spent in the solver.
476
477.. member:: double GradientProblemSolver::Summary::cost_evaluation_time_in_seconds
478
479 Time (in seconds) spent evaluating the cost vector.
480
481.. member:: double GradientProblemSolver::Summary::gradient_evaluation_time_in_seconds
482
483 Time (in seconds) spent evaluating the gradient vector.
484
485.. member:: int GradientProblemSolver::Summary::num_parameters
486
487 Number of parameters in the problem.
488
489.. member:: int GradientProblemSolver::Summary::num_local_parameters
490
491 Dimension of the tangent space of the problem. This is different
492 from :member:`GradientProblemSolver::Summary::num_parameters` if a
493 :class:`LocalParameterization` object is used.
494
495.. member:: LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type
496
497 Type of line search direction used.
498
499.. member:: LineSearchType GradientProblemSolver::Summary::line_search_type
500
501 Type of the line search algorithm used.
502
503.. member:: LineSearchInterpolationType GradientProblemSolver::Summary::line_search_interpolation_type
504
505 When performing line search, the degree of the polynomial used to
506 approximate the objective function.
507
508.. member:: NonlinearConjugateGradientType GradientProblemSolver::Summary::nonlinear_conjugate_gradient_type
509
510 If the line search direction is `NONLINEAR_CONJUGATE_GRADIENT`,
511 then this indicates the particular variant of non-linear conjugate
512 gradient used.
513
514.. member:: int GradientProblemSolver::Summary::max_lbfgs_rank
515
516 If the type of the line search direction is `LBFGS`, then this
517 indicates the rank of the Hessian approximation.