Blame - docs/source/gradient_solver.rst - RealtimeRoboticsGroup/test

blob: 1356e7475513b42e035e91cab42644ccdff318d3 [file] [log] [blame]

Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame^]	1	.. highlight:: c++
				2
				3	.. default-domain:: cpp
				4
				5	.. _chapter-gradient_problem_solver:
				6
				7	==================================
				8	General Unconstrained Minimization
				9	==================================
				10
				11	Modeling
				12	========
				13
				14	:class:`FirstOrderFunction`
				15	---------------------------
				16
				17	.. class:: FirstOrderFunction
				18
				19	Instances of :class:`FirstOrderFunction` implement the evaluation of
				20	a function and its gradient.
				21
				22	.. code-block:: c++
				23
				24	class FirstOrderFunction {
				25	public:
				26	virtual ~FirstOrderFunction() {}
				27	virtual bool Evaluate(const double* const parameters,
				28	double* cost,
				29	double* gradient) const = 0;
				30	virtual int NumParameters() const = 0;
				31	};
				32
				33	.. function:: bool FirstOrderFunction::Evaluate(const double* const parameters, double* cost, double* gradient) const
				34
				35	Evaluate the cost/value of the function. If ``gradient`` is not
				36	``NULL`` then evaluate the gradient too. If evaluation is
				37	successful return, ``true`` else return ``false``.
				38
				39	``cost`` guaranteed to be never ``NULL``, ``gradient`` can be ``NULL``.
				40
				41	.. function:: int FirstOrderFunction::NumParameters() const
				42
				43	Number of parameters in the domain of the function.
				44
				45
				46	:class:`GradientProblem`
				47	------------------------
				48
				49	.. class:: GradientProblem
				50
				51	.. code-block:: c++
				52
				53	class GradientProblem {
				54	public:
				55	explicit GradientProblem(FirstOrderFunction* function);
				56	GradientProblem(FirstOrderFunction* function,
				57	LocalParameterization* parameterization);
				58	int NumParameters() const;
				59	int NumLocalParameters() const;
				60	bool Evaluate(const double* parameters, double* cost, double* gradient) const;
				61	bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
				62	};
				63
				64	Instances of :class:`GradientProblem` represent general non-linear
				65	optimization problems that must be solved using just the value of the
				66	objective function and its gradient. Unlike the :class:`Problem`
				67	class, which can only be used to model non-linear least squares
				68	problems, instances of :class:`GradientProblem` not restricted in the
				69	form of the objective function.
				70
				71	Structurally :class:`GradientProblem` is a composition of a
				72	:class:`FirstOrderFunction` and optionally a
				73	:class:`LocalParameterization`.
				74
				75	The :class:`FirstOrderFunction` is responsible for evaluating the cost
				76	and gradient of the objective function.
				77
				78	The :class:`LocalParameterization` is responsible for going back and
				79	forth between the ambient space and the local tangent space. When a
				80	:class:`LocalParameterization` is not provided, then the tangent space
				81	is assumed to coincide with the ambient Euclidean space that the
				82	gradient vector lives in.
				83
				84	The constructor takes ownership of the :class:`FirstOrderFunction` and
				85	:class:`LocalParamterization` objects passed to it.
				86
				87
				88	.. function:: void Solve(const GradientProblemSolver::Options& options, const GradientProblem& problem, double* parameters, GradientProblemSolver::Summary* summary)
				89
				90	Solve the given :class:`GradientProblem` using the values in
				91	``parameters`` as the initial guess of the solution.
				92
				93
				94	Solving
				95	=======
				96
				97	:class:`GradientProblemSolver::Options`
				98	---------------------------------------
				99
				100	.. class:: GradientProblemSolver::Options
				101
				102	:class:`GradientProblemSolver::Options` controls the overall
				103	behavior of the solver. We list the various settings and their
				104	default values below.
				105
				106	.. function:: bool GradientProblemSolver::Options::IsValid(string* error) const
				107
				108	Validate the values in the options struct and returns true on
				109	success. If there is a problem, the method returns false with
				110	``error`` containing a textual description of the cause.
				111
				112	.. member:: LineSearchDirectionType GradientProblemSolver::Options::line_search_direction_type
				113
				114	Default: ``LBFGS``
				115
				116	Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
				117	``BFGS`` and ``LBFGS``.
				118
				119	.. member:: LineSearchType GradientProblemSolver::Options::line_search_type
				120
				121	Default: ``WOLFE``
				122
				123	Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
				124	Note that in order for the assumptions underlying the ``BFGS`` and
				125	``LBFGS`` line search direction algorithms to be guaranteed to be
				126	satisifed, the ``WOLFE`` line search should be used.
				127
				128	.. member:: NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type
				129
				130	Default: ``FLETCHER_REEVES``
				131
				132	Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and
				133	``HESTENES_STIEFEL``.
				134
				135	.. member:: int GradientProblemSolver::Options::max_lbfs_rank
				136
				137	Default: 20
				138
				139	The L-BFGS hessian approximation is a low rank approximation to the
				140	inverse of the Hessian matrix. The rank of the approximation
				141	determines (linearly) the space and time complexity of using the
				142	approximation. Higher the rank, the better is the quality of the
				143	approximation. The increase in quality is however is bounded for a
				144	number of reasons.
				145
				146	1. The method only uses secant information and not actual
				147	derivatives.
				148
				149	2. The Hessian approximation is constrained to be positive
				150	definite.
				151
				152	So increasing this rank to a large number will cost time and space
				153	complexity without the corresponding increase in solution
				154	quality. There are no hard and fast rules for choosing the maximum
				155	rank. The best choice usually requires some problem specific
				156	experimentation.
				157
				158	.. member:: bool GradientProblemSolver::Options::use_approximate_eigenvalue_bfgs_scaling
				159
				160	Default: ``false``
				161
				162	As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
				163	step, the initial inverse Hessian approximation is taken to be the
				164	Identity. However, [Oren]_ showed that using instead :math:`I *
				165	\gamma`, where :math:`\gamma` is a scalar chosen to approximate an
				166	eigenvalue of the true inverse Hessian can result in improved
				167	convergence in a wide variety of cases. Setting
				168	``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
				169	scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
				170	iteration).
				171
				172	Precisely, approximate eigenvalue scaling equates to
				173
				174	.. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
				175
				176	With:
				177
				178	.. math:: y_k = \nabla f_{k+1} - \nabla f_k
				179	.. math:: s_k = x_{k+1} - x_k
				180
				181	Where :math:`f()` is the line search objective and :math:`x` the
				182	vector of parameter values [NocedalWright]_.
				183
				184	It is important to note that approximate eigenvalue scaling does
				185	not always improve convergence, and that it can in fact
				186	significantly degrade performance for certain classes of problem,
				187	which is why it is disabled by default. In particular it can
				188	degrade performance when the sensitivity of the problem to different
				189	parameters varies significantly, as in this case a single scalar
				190	factor fails to capture this variation and detrimentally downscales
				191	parts of the Jacobian approximation which correspond to
				192	low-sensitivity parameters. It can also reduce the robustness of the
				193	solution to errors in the Jacobians.
				194
				195	.. member:: LineSearchIterpolationType GradientProblemSolver::Options::line_search_interpolation_type
				196
				197	Default: ``CUBIC``
				198
				199	Degree of the polynomial used to approximate the objective
				200	function. Valid values are ``BISECTION``, ``QUADRATIC`` and
				201	``CUBIC``.
				202
				203	.. member:: double GradientProblemSolver::Options::min_line_search_step_size
				204
				205	The line search terminates if:
				206
				207	.. math:: \\|\Delta x_k\\|_\infty < \text{min_line_search_step_size}
				208
				209	where :math:`\\|\cdot\\|_\infty` refers to the max norm, and
				210	:math:`\Delta x_k` is the step change in the parameter values at
				211	the :math:`k`-th iteration.
				212
				213	.. member:: double GradientProblemSolver::Options::line_search_sufficient_function_decrease
				214
				215	Default: ``1e-4``
				216
				217	Solving the line search problem exactly is computationally
				218	prohibitive. Fortunately, line search based optimization algorithms
				219	can still guarantee convergence if instead of an exact solution,
				220	the line search algorithm returns a solution which decreases the
				221	value of the objective function sufficiently. More precisely, we
				222	are looking for a step size s.t.
				223
				224	.. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
				225
				226	This condition is known as the Armijo condition.
				227
				228	.. member:: double GradientProblemSolver::Options::max_line_search_step_contraction
				229
				230	Default: ``1e-3``
				231
				232	In each iteration of the line search,
				233
				234	.. math:: \text{new_step_size} \geq \text{max_line_search_step_contraction} * \text{step_size}
				235
				236	Note that by definition, for contraction:
				237
				238	.. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
				239
				240	.. member:: double GradientProblemSolver::Options::min_line_search_step_contraction
				241
				242	Default: ``0.6``
				243
				244	In each iteration of the line search,
				245
				246	.. math:: \text{new_step_size} \leq \text{min_line_search_step_contraction} * \text{step_size}
				247
				248	Note that by definition, for contraction:
				249
				250	.. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
				251
				252	.. member:: int GradientProblemSolver::Options::max_num_line_search_step_size_iterations
				253
				254	Default: ``20``
				255
				256	Maximum number of trial step size iterations during each line
				257	search, if a step size satisfying the search conditions cannot be
				258	found within this number of trials, the line search will stop.
				259
				260	As this is an 'artificial' constraint (one imposed by the user, not
				261	the underlying math), if ``WOLFE`` line search is being used, and
				262	points satisfying the Armijo sufficient (function) decrease
				263	condition have been found during the current search (in :math:`\leq`
				264	``max_num_line_search_step_size_iterations``). Then, the step size
				265	with the lowest function value which satisfies the Armijo condition
				266	will be returned as the new valid step, even though it does not
				267	satisfy the strong Wolfe conditions. This behaviour protects
				268	against early termination of the optimizer at a sub-optimal point.
				269
				270	.. member:: int GradientProblemSolver::Options::max_num_line_search_direction_restarts
				271
				272	Default: ``5``
				273
				274	Maximum number of restarts of the line search direction algorithm
				275	before terminating the optimization. Restarts of the line search
				276	direction algorithm occur when the current algorithm fails to
				277	produce a new descent direction. This typically indicates a
				278	numerical failure, or a breakdown in the validity of the
				279	approximations used.
				280
				281	.. member:: double GradientProblemSolver::Options::line_search_sufficient_curvature_decrease
				282
				283	Default: ``0.9``
				284
				285	The strong Wolfe conditions consist of the Armijo sufficient
				286	decrease condition, and an additional requirement that the
				287	step size be chosen s.t. the magnitude ('strong' Wolfe
				288	conditions) of the gradient along the search direction
				289	decreases sufficiently. Precisely, this second condition
				290	is that we seek a step size s.t.
				291
				292	.. math:: \\|f'(\text{step_size})\\| \leq \text{sufficient_curvature_decrease} * \\|f'(0)\\|
				293
				294	Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
				295	of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
				296
				297	.. member:: double GradientProblemSolver::Options::max_line_search_step_expansion
				298
				299	Default: ``10.0``
				300
				301	During the bracketing phase of a Wolfe line search, the step size
				302	is increased until either a point satisfying the Wolfe conditions
				303	is found, or an upper bound for a bracket containing a point
				304	satisfying the conditions is found. Precisely, at each iteration
				305	of the expansion:
				306
				307	.. math:: \text{new_step_size} \leq \text{max_step_expansion} * \text{step_size}
				308
				309	By definition for expansion
				310
				311	.. math:: \text{max_step_expansion} > 1.0
				312
				313	.. member:: int GradientProblemSolver::Options::max_num_iterations
				314
				315	Default: ``50``
				316
				317	Maximum number of iterations for which the solver should run.
				318
				319	.. member:: double GradientProblemSolver::Options::max_solver_time_in_seconds
				320
				321	Default: ``1e6``
				322	Maximum amount of time for which the solver should run.
				323
				324	.. member:: double GradientProblemSolver::Options::function_tolerance
				325
				326	Default: ``1e-6``
				327
				328	Solver terminates if
				329
				330	.. math:: \frac{\|\Delta \text{cost}\|}{\text{cost}} \leq \text{function_tolerance}
				331
				332	where, :math:`\Delta \text{cost}` is the change in objective
				333	function value (up or down) in the current iteration of the line search.
				334
				335	.. member:: double GradientProblemSolver::Options::gradient_tolerance
				336
				337	Default: ``1e-10``
				338
				339	Solver terminates if
				340
				341	.. math:: \\|x - \Pi \boxplus(x, -g(x))\\|_\infty \leq \text{gradient_tolerance}
				342
				343	where :math:`\\|\cdot\\|_\infty` refers to the max norm, :math:`\Pi`
				344	is projection onto the bounds constraints and :math:`\boxplus` is
				345	Plus operation for the overall local parameterization associated
				346	with the parameter vector.
				347
				348	.. member:: double GradientProblemSolver::Options::parameter_tolerance
				349
				350	Default: ``1e-8``
				351
				352	Solver terminates if
				353
				354	.. math:: \\|\Delta x\\| \leq (\\|x\\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
				355
				356	where :math:`\Delta x` is the step computed by the linear solver in
				357	the current iteration of the line search.
				358
				359	.. member:: LoggingType GradientProblemSolver::Options::logging_type
				360
				361	Default: ``PER_MINIMIZER_ITERATION``
				362
				363	.. member:: bool GradientProblemSolver::Options::minimizer_progress_to_stdout
				364
				365	Default: ``false``
				366
				367	By default the :class:`Minimizer` progress is logged to ``STDERR``
				368	depending on the ``vlog`` level. If this flag is set to true, and
				369	:member:`GradientProblemSolver::Options::logging_type` is not
				370	``SILENT``, the logging output is sent to ``STDOUT``.
				371
				372	The progress display looks like
				373
				374	.. code-block:: bash
				375
				376	0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
				377	1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
				378	2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
				379
				380	Here
				381
				382	#. ``f`` is the value of the objective function.
				383	#. ``d`` is the change in the value of the objective function if
				384	the step computed in this iteration is accepted.
				385	#. ``g`` is the max norm of the gradient.
				386	#. ``h`` is the change in the parameter vector.
				387	#. ``s`` is the optimal step length computed by the line search.
				388	#. ``it`` is the time take by the current iteration.
				389	#. ``tt`` is the total time taken by the minimizer.
				390
				391	.. member:: vector<IterationCallback> GradientProblemSolver::Options::callbacks
				392
				393	Callbacks that are executed at the end of each iteration of the
				394	:class:`Minimizer`. They are executed in the order that they are
				395	specified in this vector. By default, parameter blocks are updated
				396	only at the end of the optimization, i.e., when the
				397	:class:`Minimizer` terminates. This behavior is controlled by
				398	:member:`GradientProblemSolver::Options::update_state_every_variable`. If
				399	the user wishes to have access to the update parameter blocks when
				400	his/her callbacks are executed, then set
				401	:member:`GradientProblemSolver::Options::update_state_every_iteration`
				402	to true.
				403
				404	The solver does NOT take ownership of these pointers.
				405
				406
				407	.. member:: bool Solver::Options::update_state_every_iteration
				408
				409	Default: ``false``
				410
				411	Normally the parameter vector is only updated when the solver
				412	terminates. Setting this to true updates it every iteration. This
				413	setting is useful when building an interactive application using
				414	Ceres and using an :class:`IterationCallback`.
				415
				416	:class:`GradientProblemSolver::Summary`
				417	---------------------------------------
				418
				419	.. class:: GradientProblemSolver::Summary
				420
				421	Summary of the various stages of the solver after termination.
				422
				423	.. function:: string GradientProblemSolver::Summary::BriefReport() const
				424
				425	A brief one line description of the state of the solver after
				426	termination.
				427
				428	.. function:: string GradientProblemSolver::Summary::FullReport() const
				429
				430	A full multiline description of the state of the solver after
				431	termination.
				432
				433	.. function:: bool GradientProblemSolver::Summary::IsSolutionUsable() const
				434
				435	Whether the solution returned by the optimization algorithm can be
				436	relied on to be numerically sane. This will be the case if
				437	`GradientProblemSolver::Summary:termination_type` is set to `CONVERGENCE`,
				438	`USER_SUCCESS` or `NO_CONVERGENCE`, i.e., either the solver
				439	converged by meeting one of the convergence tolerances or because
				440	the user indicated that it had converged or it ran to the maximum
				441	number of iterations or time.
				442
				443	.. member:: TerminationType GradientProblemSolver::Summary::termination_type
				444
				445	The cause of the minimizer terminating.
				446
				447	.. member:: string GradientProblemSolver::Summary::message
				448
				449	Reason why the solver terminated.
				450
				451	.. member:: double GradientProblemSolver::Summary::initial_cost
				452
				453	Cost of the problem (value of the objective function) before the
				454	optimization.
				455
				456	.. member:: double GradientProblemSolver::Summary::final_cost
				457
				458	Cost of the problem (value of the objective function) after the
				459	optimization.
				460
				461	.. member:: vector<IterationSummary> GradientProblemSolver::Summary::iterations
				462
				463	:class:`IterationSummary` for each minimizer iteration in order.
				464
				465	.. member:: int num_cost_evaluations
				466
				467	Number of times the cost (and not the gradient) was evaluated.
				468
				469	.. member:: int num_gradient_evaluations
				470
				471	Number of times the gradient (and the cost) were evaluated.
				472
				473	.. member:: double GradientProblemSolver::Summary::total_time_in_seconds
				474
				475	Time (in seconds) spent in the solver.
				476
				477	.. member:: double GradientProblemSolver::Summary::cost_evaluation_time_in_seconds
				478
				479	Time (in seconds) spent evaluating the cost vector.
				480
				481	.. member:: double GradientProblemSolver::Summary::gradient_evaluation_time_in_seconds
				482
				483	Time (in seconds) spent evaluating the gradient vector.
				484
				485	.. member:: int GradientProblemSolver::Summary::num_parameters
				486
				487	Number of parameters in the problem.
				488
				489	.. member:: int GradientProblemSolver::Summary::num_local_parameters
				490
				491	Dimension of the tangent space of the problem. This is different
				492	from :member:`GradientProblemSolver::Summary::num_parameters` if a
				493	:class:`LocalParameterization` object is used.
				494
				495	.. member:: LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type
				496
				497	Type of line search direction used.
				498
				499	.. member:: LineSearchType GradientProblemSolver::Summary::line_search_type
				500
				501	Type of the line search algorithm used.
				502
				503	.. member:: LineSearchInterpolationType GradientProblemSolver::Summary::line_search_interpolation_type
				504
				505	When performing line search, the degree of the polynomial used to
				506	approximate the objective function.
				507
				508	.. member:: NonlinearConjugateGradientType GradientProblemSolver::Summary::nonlinear_conjugate_gradient_type
				509
				510	If the line search direction is `NONLINEAR_CONJUGATE_GRADIENT`,
				511	then this indicates the particular variant of non-linear conjugate
				512	gradient used.
				513
				514	.. member:: int GradientProblemSolver::Summary::max_lbfgs_rank
				515
				516	If the type of the line search direction is `LBFGS`, then this
				517	indicates the rank of the Hessian approximation.