blob: 713d54d527b1c775dd36753d0b663f3592bce53f [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001
2.. default-domain:: cpp
3
4.. cpp:namespace:: ceres
5
6.. _chapter-nnls_solving:
7
8================================
9Solving Non-linear Least Squares
10================================
11
12Introduction
13============
14
15Effective use of Ceres requires some familiarity with the basic
16components of a non-linear least squares solver, so before we describe
17how to configure and use the solver, we will take a brief look at how
18some of the core optimization algorithms in Ceres work.
19
20Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
21variables, and
22:math:`F(x) = \left[f_1(x), ... , f_{m}(x) \right]^{\top}` be a
23:math:`m`-dimensional function of :math:`x`. We are interested in
24solving the optimization problem [#f1]_
25
26.. math:: \arg \min_x \frac{1}{2}\|F(x)\|^2\ . \\
27 L \le x \le U
28 :label: nonlinsq
29
30Where, :math:`L` and :math:`U` are lower and upper bounds on the
31parameter vector :math:`x`.
32
33Since the efficient global minimization of :eq:`nonlinsq` for
34general :math:`F(x)` is an intractable problem, we will have to settle
35for finding a local minimum.
36
37In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
38:math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
39and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
40= J(x)^\top F(x)`.
41
42The general strategy when solving non-linear optimization problems is
43to solve a sequence of approximations to the original problem
44[NocedalWright]_. At each iteration, the approximation is solved to
45determine a correction :math:`\Delta x` to the vector :math:`x`. For
46non-linear least squares, an approximation can be constructed by using
47the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`,
48which leads to the following linear least squares problem:
49
50.. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2
51 :label: linearapprox
52
53Unfortunately, naively solving a sequence of these problems and
54updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that
55may not converge. To get a convergent algorithm, we need to control
56the size of the step :math:`\Delta x`. Depending on how the size of
57the step :math:`\Delta x` is controlled, non-linear optimization
58algorithms can be divided into two major categories [NocedalWright]_.
59
601. **Trust Region** The trust region approach approximates the
61 objective function using using a model function (often a quadratic)
62 over a subset of the search space known as the trust region. If the
63 model function succeeds in minimizing the true objective function
64 the trust region is expanded; conversely, otherwise it is
65 contracted and the model optimization problem is solved again.
66
672. **Line Search** The line search approach first finds a descent
68 direction along which the objective function will be reduced and
69 then computes a step size that decides how far should move along
70 that direction. The descent direction can be computed by various
71 methods, such as gradient descent, Newton's method and Quasi-Newton
72 method. The step size can be determined either exactly or
73 inexactly.
74
75Trust region methods are in some sense dual to line search methods:
76trust region methods first choose a step size (the size of the trust
77region) and then a step direction while line search methods first
78choose a step direction and then a step size. Ceres implements
79multiple algorithms in both categories.
80
81.. _section-trust-region-methods:
82
83Trust Region Methods
84====================
85
86The basic trust region algorithm looks something like this.
87
88 1. Given an initial point :math:`x` and a trust region radius :math:`\mu`.
89 2. Solve
90
91 .. math::
92 \arg \min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\
93 \text{such that} &\|D(x)\Delta x\|^2 \le \mu\\
94 &L \le x + \Delta x \le U.
95
96 3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 -
97 \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 -
98 \|F(x)\|^2}`
99 4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`.
100 5. if :math:`\rho > \eta_1` then :math:`\mu = 2 \mu`
101 6. else if :math:`\rho < \eta_2` then :math:`\mu = 0.5 * \mu`
102 7. Go to 2.
103
104Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some
105matrix used to define a metric on the domain of :math:`F(x)` and
106:math:`\rho` measures the quality of the step :math:`\Delta x`, i.e.,
107how well did the linear model predict the decrease in the value of the
108non-linear objective. The idea is to increase or decrease the radius
109of the trust region depending on how well the linearization predicts
110the behavior of the non-linear objective, which in turn is reflected
111in the value of :math:`\rho`.
112
113The key computational step in a trust-region algorithm is the solution
114of the constrained optimization problem
115
116.. math::
117 \arg \min_{\Delta x}&\quad \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 \\
118 \text{such that} &\quad \|D(x)\Delta x\|^2 \le \mu\\
119 &\quad L \le x + \Delta x \le U.
120 :label: trp
121
122There are a number of different ways of solving this problem, each
123giving rise to a different concrete trust-region algorithm. Currently,
124Ceres implements two trust-region algorithms - Levenberg-Marquardt
125and Dogleg, each of which is augmented with a line search if bounds
126constraints are present [Kanzow]_. The user can choose between them by
127setting :member:`Solver::Options::trust_region_strategy_type`.
128
129.. rubric:: Footnotes
130
131.. [#f1] At the level of the non-linear solver, the block structure is
132 not relevant, therefore our discussion here is in terms of an
133 optimization problem defined over a state vector of size
134 :math:`n`. Similarly the presence of loss functions is also
135 ignored as the problem is internally converted into a pure
136 non-linear least squares problem.
137
138
139.. _section-levenberg-marquardt:
140
141Levenberg-Marquardt
142-------------------
143
144The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the
145most popular algorithm for solving non-linear least squares problems.
146It was also the first trust region algorithm to be developed
147[Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_
148and an inexact step variant of the Levenberg-Marquardt algorithm
149[WrightHolt]_ [NashSofer]_.
150
151It can be shown, that the solution to :eq:`trp` can be obtained by
152solving an unconstrained optimization of the form
153
154.. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda \|D(x)\Delta x\|^2
155
156Where, :math:`\lambda` is a Lagrange multiplier that is inverse
157related to :math:`\mu`. In Ceres, we solve for
158
159.. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
160 :label: lsqr
161
162The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
163the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
164
165Before going further, let us make some notational simplifications. We
166will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been concatenated
167at the bottom of the matrix :math:`J` and similarly a vector of zeros
168has been added to the bottom of the vector :math:`f` and the rest of
169our discussion will be in terms of :math:`J` and :math:`f`, i.e, the
170linear least squares problem.
171
172.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
173 :label: simple
174
175For all but the smallest problems the solution of :eq:`simple` in
176each iteration of the Levenberg-Marquardt algorithm is the dominant
177computational cost in Ceres. Ceres provides a number of different
178options for solving :eq:`simple`. There are two major classes of
179methods - factorization and iterative.
180
181The factorization methods are based on computing an exact solution of
182:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
183step Levenberg-Marquardt algorithm. But it is not clear if an exact
184solution of :eq:`lsqr` is necessary at each step of the LM algorithm
185to solve :eq:`nonlinsq`. In fact, we have already seen evidence
186that this may not be the case, as :eq:`lsqr` is itself a regularized
187version of :eq:`linearapprox`. Indeed, it is possible to
188construct non-linear optimization algorithms in which the linearized
189problem is solved approximately. These algorithms are known as inexact
190Newton or truncated Newton methods [NocedalWright]_.
191
192An inexact Newton method requires two ingredients. First, a cheap
193method for approximately solving systems of linear
194equations. Typically an iterative linear solver like the Conjugate
195Gradients method is used for this
196purpose [NocedalWright]_. Second, a termination rule for
197the iterative solver. A typical termination rule is of the form
198
199.. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
200 :label: inexact
201
202Here, :math:`k` indicates the Levenberg-Marquardt iteration number and
203:math:`0 < \eta_k <1` is known as the forcing sequence. [WrightHolt]_
204prove that a truncated Levenberg-Marquardt algorithm that uses an
205inexact Newton step based on :eq:`inexact` converges for any
206sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence
207depends on the choice of the forcing sequence :math:`\eta_k`.
208
209Ceres supports both exact and inexact step solution strategies. When
210the user chooses a factorization based linear solver, the exact step
211Levenberg-Marquardt algorithm is used. When the user chooses an
212iterative linear solver, the inexact step Levenberg-Marquardt
213algorithm is used.
214
215.. _section-dogleg:
216
217Dogleg
218------
219
220Another strategy for solving the trust region problem :eq:`trp` was
221introduced by M. J. D. Powell. The key idea there is to compute two
222vectors
223
224.. math::
225
226 \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\
227 \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x).
228
229Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the
230solution to :eq:`linearapprox` and :math:`\Delta
231x^{\text{Cauchy}}` is the vector that minimizes the linear
232approximation if we restrict ourselves to moving along the direction
233of the gradient. Dogleg methods finds a vector :math:`\Delta x`
234defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta
235x^{\text{Cauchy}}` that solves the trust region problem. Ceres
236supports two variants that can be chose by setting
237:member:`Solver::Options::dogleg_type`.
238
239``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line
240segments using the Gauss-Newton and Cauchy vectors and finds the point
241farthest along this line shaped like a dogleg (hence the name) that is
242contained in the trust-region. For more details on the exact reasoning
243and computations, please see Madsen et al [Madsen]_.
244
245``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the
246entire two dimensional subspace spanned by these two vectors and finds
247the point that minimizes the trust region problem in this subspace
248[ByrdSchnabel]_.
249
250The key advantage of the Dogleg over Levenberg-Marquardt is that if
251the step computation for a particular choice of :math:`\mu` does not
252result in sufficient decrease in the value of the objective function,
253Levenberg-Marquardt solves the linear approximation from scratch with
254a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
255to compute the interpolation between the Gauss-Newton and the Cauchy
256vectors, as neither of them depend on the value of :math:`\mu`.
257
258The Dogleg method can only be used with the exact factorization based
259linear solvers.
260
261.. _section-inner-iterations:
262
263Inner Iterations
264----------------
265
266Some non-linear least squares problems have additional structure in
267the way the parameter blocks interact that it is beneficial to modify
268the way the trust region step is computed. For example, consider the
269following regression problem
270
271.. math:: y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1}
272
273
274Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate
275:math:`a_1, a_2, b_1, b_2`, and :math:`c_1`.
276
277Notice that the expression on the left is linear in :math:`a_1` and
278:math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`,
279it is possible to use linear regression to estimate the optimal values
280of :math:`a_1` and :math:`a_2`. It's possible to analytically
281eliminate the variables :math:`a_1` and :math:`a_2` from the problem
282entirely. Problems like these are known as separable least squares
283problem and the most famous algorithm for solving them is the Variable
284Projection algorithm invented by Golub & Pereyra [GolubPereyra]_.
285
286Similar structure can be found in the matrix factorization with
287missing data problem. There the corresponding algorithm is known as
288Wiberg's algorithm [Wiberg]_.
289
290Ruhe & Wedin present an analysis of various algorithms for solving
291separable non-linear least squares problems and refer to *Variable
292Projection* as Algorithm I in their paper [RuheWedin]_.
293
294Implementing Variable Projection is tedious and expensive. Ruhe &
295Wedin present a simpler algorithm with comparable convergence
296properties, which they call Algorithm II. Algorithm II performs an
297additional optimization step to estimate :math:`a_1` and :math:`a_2`
298exactly after computing a successful Newton step.
299
300
301This idea can be generalized to cases where the residual is not
302linear in :math:`a_1` and :math:`a_2`, i.e.,
303
304.. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1})
305
306In this case, we solve for the trust region step for the full problem,
307and then use it as the starting point to further optimize just `a_1`
308and `a_2`. For the linear case, this amounts to doing a single linear
309least squares solve. For non-linear problems, any method for solving
310the :math:`a_1` and :math:`a_2` optimization problems will do. The
311only constraint on :math:`a_1` and :math:`a_2` (if they are two
312different parameter block) is that they do not co-occur in a residual
313block.
314
315This idea can be further generalized, by not just optimizing
316:math:`(a_1, a_2)`, but decomposing the graph corresponding to the
317Hessian matrix's sparsity structure into a collection of
318non-overlapping independent sets and optimizing each of them.
319
320Setting :member:`Solver::Options::use_inner_iterations` to ``true``
321enables the use of this non-linear generalization of Ruhe & Wedin's
322Algorithm II. This version of Ceres has a higher iteration
323complexity, but also displays better convergence behavior per
324iteration.
325
326Setting :member:`Solver::Options::num_threads` to the maximum number
327possible is highly recommended.
328
329.. _section-non-monotonic-steps:
330
331Non-monotonic Steps
332-------------------
333
334Note that the basic trust-region algorithm described in
335:ref:`section-trust-region-methods` is a descent algorithm in that it
336only accepts a point if it strictly reduces the value of the objective
337function.
338
339Relaxing this requirement allows the algorithm to be more efficient in
340the long term at the cost of some local increase in the value of the
341objective function.
342
343This is because allowing for non-decreasing objective function values
344in a principled manner allows the algorithm to *jump over boulders* as
345the method is not restricted to move into narrow valleys while
346preserving its convergence properties.
347
348Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true``
349enables the non-monotonic trust region algorithm as described by Conn,
350Gould & Toint in [Conn]_.
351
352Even though the value of the objective function may be larger
353than the minimum value encountered over the course of the
354optimization, the final parameters returned to the user are the
355ones corresponding to the minimum cost over all iterations.
356
357The option to take non-monotonic steps is available for all trust
358region strategies.
359
360
361.. _section-line-search-methods:
362
363Line Search Methods
364===================
365
366The line search method in Ceres Solver cannot handle bounds
367constraints right now, so it can only be used for solving
368unconstrained problems.
369
370Line search algorithms
371
372 1. Given an initial point :math:`x`
373 2. :math:`\Delta x = -H^{-1}(x) g(x)`
374 3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2`
375 4. :math:`x = x + \mu \Delta x`
376 5. Goto 2.
377
378Here :math:`H(x)` is some approximation to the Hessian of the
379objective function, and :math:`g(x)` is the gradient at
380:math:`x`. Depending on the choice of :math:`H(x)` we get a variety of
381different search directions :math:`\Delta x`.
382
383Step 4, which is a one dimensional optimization or `Line Search` along
384:math:`\Delta x` is what gives this class of methods its name.
385
386Different line search algorithms differ in their choice of the search
387direction :math:`\Delta x` and the method used for one dimensional
388optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
389primary source of computational complexity in these
390methods. Currently, Ceres Solver supports three choices of search
391directions, all aimed at large scale problems.
392
3931. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
394 be the identity matrix. This is not a good search direction for
395 anything but the simplest of the problems. It is only included here
396 for completeness.
397
3982. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate
399 Gradient method to non-linear functions. The generalization can be
400 performed in a number of different ways, resulting in a variety of
401 search directions. Ceres Solver currently supports
402 ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and ``HESTENES_STIEFEL``
403 directions.
404
4053. ``BFGS`` A generalization of the Secant method to multiple
406 dimensions in which a full, dense approximation to the inverse
407 Hessian is maintained and used to compute a quasi-Newton step
408 [NocedalWright]_. BFGS is currently the best known general
409 quasi-Newton algorithm.
410
4114. ``LBFGS`` A limited memory approximation to the full ``BFGS``
412 method in which the last `M` iterations are used to approximate the
413 inverse Hessian used to compute a quasi-Newton step [Nocedal]_,
414 [ByrdNocedal]_.
415
416Currently Ceres Solver supports both a backtracking and interpolation
417based Armijo line search algorithm, and a sectioning / zoom
418interpolation (strong) Wolfe condition line search algorithm.
419However, note that in order for the assumptions underlying the
420``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
421Wolfe line search algorithm should be used.
422
423.. _section-linear-solver:
424
425LinearSolver
426============
427
428Recall that in both of the trust-region methods described above, the
429key computational cost is the solution of a linear least squares
430problem of the form
431
432.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
433 :label: simple2
434
435Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
436f(x)`. For notational convenience let us also drop the dependence on
437:math:`x`. Then it is easy to see that solving :eq:`simple2` is
438equivalent to solving the *normal equations*.
439
440.. math:: H \Delta x = g
441 :label: normal
442
443Ceres provides a number of different options for solving :eq:`normal`.
444
445.. _section-qr:
446
447``DENSE_QR``
448------------
449
450For small problems (a couple of hundred parameters and a few thousand
451residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
452of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
453:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
454an upper triangular matrix [TrefethenBau]_. Then it can be shown that
455the solution to :eq:`normal` is given by
456
457.. math:: \Delta x^* = -R^{-1}Q^\top f
458
459
460Ceres uses ``Eigen`` 's dense QR factorization routines.
461
462.. _section-cholesky:
463
464``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
465------------------------------------------------------
466
467Large non-linear least square problems are usually sparse. In such
468cases, using a dense QR factorization is inefficient. Let :math:`H =
469R^\top R` be the Cholesky factorization of the normal equations, where
470:math:`R` is an upper triangular matrix, then the solution to
471:eq:`normal` is given by
472
473.. math::
474
475 \Delta x^* = R^{-1} R^{-\top} g.
476
477
478The observant reader will note that the :math:`R` in the Cholesky
479factorization of :math:`H` is the same upper triangular matrix
480:math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
481orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
482Q^\top Q R = R^\top R`. There are two variants of Cholesky
483factorization -- sparse and dense.
484
485``DENSE_NORMAL_CHOLESKY`` as the name implies performs a dense
486Cholesky factorization of the normal equations. Ceres uses
487``Eigen`` 's dense LDLT factorization routines.
488
489``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
490Cholesky factorization of the normal equations. This leads to
491substantial savings in time and memory for large sparse
492problems. Ceres uses the sparse Cholesky factorization routines in
493Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
494or the sparse Cholesky factorization algorithm in ``Eigen`` (which
495incidently is a port of the algorithm implemented inside ``CXSparse``)
496
497.. _section-cgnr:
498
499``CGNR``
500--------
501
502For general sparse problems, if the problem is too large for
503``CHOLMOD`` or a sparse linear algebra library is not linked into
504Ceres, another option is the ``CGNR`` solver. This solver uses the
505Conjugate Gradients solver on the *normal equations*, but without
506forming the normal equations explicitly. It exploits the relation
507
508.. math::
509 H x = J^\top J x = J^\top(J x)
510
511The convergence of Conjugate Gradients depends on the conditioner
512number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
513a :ref:`section-preconditioner` must be used to get reasonable
514performance. Currently only the ``JACOBI`` preconditioner is available
515for use with ``CGNR``. It uses the block diagonal of :math:`H` to
516precondition the normal equations.
517
518When the user chooses ``CGNR`` as the linear solver, Ceres
519automatically switches from the exact step algorithm to an inexact
520step algorithm.
521
522.. _section-schur:
523
524``DENSE_SCHUR`` & ``SPARSE_SCHUR``
525----------------------------------
526
527While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
528adjustment problems, bundle adjustment problem have a special
529structure, and a more efficient scheme for solving :eq:`normal`
530can be constructed.
531
532Suppose that the SfM problem consists of :math:`p` cameras and
533:math:`q` points and the variable vector :math:`x` has the block
534structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
535:math:`y` and :math:`z` correspond to camera and point parameters,
536respectively. Further, let the camera blocks be of size :math:`c` and
537the point blocks be of size :math:`s` (for most problems :math:`c` =
538:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
539requirement on these block sizes, but choosing them to be constant
540simplifies the exposition.
541
542A key characteristic of the bundle adjustment problem is that there is
543no term :math:`f_{i}` that includes two or more point blocks. This in
544turn implies that the matrix :math:`H` is of the form
545
546.. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
547 :label: hblock
548
549where :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix
550with :math:`p` blocks of size :math:`c\times c` and :math:`C \in
551\mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks
552of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a
553general block sparse matrix, with a block of size :math:`c\times s`
554for each observation. Let us now block partition :math:`\Delta x =
555[\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal`
556as the block structured linear system
557
558.. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix}
559 \right]\left[ \begin{matrix} \Delta y \\ \Delta z
560 \end{matrix} \right] = \left[ \begin{matrix} v\\ w
561 \end{matrix} \right]\ ,
562 :label: linear2
563
564and apply Gaussian elimination to it. As we noted above, :math:`C` is
565a block diagonal matrix, with small diagonal blocks of size
566:math:`s\times s`. Thus, calculating the inverse of :math:`C` by
567inverting each of these blocks is cheap. This allows us to eliminate
568:math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top
569\Delta y)`, giving us
570
571.. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ .
572 :label: schur
573
574The matrix
575
576.. math:: S = B - EC^{-1}E^\top
577
578is the Schur complement of :math:`C` in :math:`H`. It is also known as
579the *reduced camera matrix*, because the only variables
580participating in :eq:`schur` are the ones corresponding to the
581cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured
582symmetric positive definite matrix, with blocks of size :math:`c\times
583c`. The block :math:`S_{ij}` corresponding to the pair of images
584:math:`i` and :math:`j` is non-zero if and only if the two images
585observe at least one common point.
586
587
588Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
589:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
590obtain the value of :math:`\Delta z`. Thus, the solution of what was
591an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
592inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
593and matrix-vector multiplies, and the solution of block sparse
594:math:`pc\times pc` linear system :eq:`schur`. For almost all
595problems, the number of cameras is much smaller than the number of
596points, :math:`p \ll q`, thus solving :eq:`schur` is
597significantly cheaper than solving :eq:`linear2`. This is the
598*Schur complement trick* [Brown]_.
599
600This still leaves open the question of solving :eq:`schur`. The
601method of choice for solving symmetric positive definite systems
602exactly is via the Cholesky factorization [TrefethenBau]_ and
603depending upon the structure of the matrix, there are, in general, two
604options. The first is direct factorization, where we store and factor
605:math:`S` as a dense matrix [TrefethenBau]_. This method has
606:math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
607is only practical for problems with up to a few hundred cameras. Ceres
608implements this strategy as the ``DENSE_SCHUR`` solver.
609
610
611But, :math:`S` is typically a fairly sparse matrix, as most images
612only see a small fraction of the scene. This leads us to the second
613option: Sparse Direct Methods. These methods store :math:`S` as a
614sparse matrix, use row and column re-ordering algorithms to maximize
615the sparsity of the Cholesky decomposition, and focus their compute
616effort on the non-zero part of the factorization [Chen]_. Sparse
617direct methods, depending on the exact sparsity structure of the Schur
618complement, allow bundle adjustment algorithms to significantly scale
619up over those based on dense factorization. Ceres implements this
620strategy as the ``SPARSE_SCHUR`` solver.
621
622.. _section-iterative_schur:
623
624``ITERATIVE_SCHUR``
625-------------------
626
627Another option for bundle adjustment problems is to apply
628Preconditioned Conjugate Gradients to the reduced camera matrix
629:math:`S` instead of :math:`H`. One reason to do this is that
630:math:`S` is a much smaller matrix than :math:`H`, but more
631importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
632Ceres implements Conjugate Gradients on :math:`S` as the
633``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
634as the linear solver, Ceres automatically switches from the exact step
635algorithm to an inexact step algorithm.
636
637The key computational operation when using Conjuagate Gradients is the
638evaluation of the matrix vector product :math:`Sx` for an arbitrary
639vector :math:`x`. There are two ways in which this product can be
640evaluated, and this can be controlled using
641``Solver::Options::use_explicit_schur_complement``. Depending on the
642problem at hand, the performance difference between these two methods
643can be quite substantial.
644
645 1. **Implicit** This is default. Implicit evaluation is suitable for
646 large problems where the cost of computing and storing the Schur
647 Complement :math:`S` is prohibitive. Because PCG only needs
648 access to :math:`S` via its product with a vector, one way to
649 evaluate :math:`Sx` is to observe that
650
651 .. math:: x_1 &= E^\top x
652 .. math:: x_2 &= C^{-1} x_1
653 .. math:: x_3 &= Ex_2\\
654 .. math:: x_4 &= Bx\\
655 .. math:: Sx &= x_4 - x_3
656 :label: schurtrick1
657
658 Thus, we can run PCG on :math:`S` with the same computational
659 effort per iteration as PCG on :math:`H`, while reaping the
660 benefits of a more powerful preconditioner. In fact, we do not
661 even need to compute :math:`H`, :eq:`schurtrick1` can be
662 implemented using just the columns of :math:`J`.
663
664 Equation :eq:`schurtrick1` is closely related to *Domain
665 Decomposition methods* for solving large linear systems that
666 arise in structural engineering and partial differential
667 equations. In the language of Domain Decomposition, each point in
668 a bundle adjustment problem is a domain, and the cameras form the
669 interface between these domains. The iterative solution of the
670 Schur complement then falls within the sub-category of techniques
671 known as Iterative Sub-structuring [Saad]_ [Mathew]_.
672
673 2. **Explicit** The complexity of implicit matrix-vector product
674 evaluation scales with the number of non-zeros in the
675 Jacobian. For small to medium sized problems, the cost of
676 constructing the Schur Complement is small enough that it is
677 better to construct it explicitly in memory and use it to
678 evaluate the product :math:`Sx`.
679
680When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
681automatically switches from the exact step algorithm to an inexact
682step algorithm.
683
684 .. NOTE::
685
686 In exact arithmetic, the choice of implicit versus explicit Schur
687 complement would have no impact on solution quality. However, in
688 practice if the Jacobian is poorly conditioned, one may observe
689 (usually small) differences in solution quality. This is a
690 natural consequence of performing computations in finite arithmetic.
691
692
693.. _section-preconditioner:
694
695Preconditioner
696--------------
697
698The convergence rate of Conjugate Gradients for
699solving :eq:`normal` depends on the distribution of eigenvalues
700of :math:`H` [Saad]_. A useful upper bound is
701:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
702number of the matrix :math:`H`. For most bundle adjustment problems,
703:math:`\kappa(H)` is high and a direct application of Conjugate
704Gradients to :eq:`normal` results in extremely poor performance.
705
706The solution to this problem is to replace :eq:`normal` with a
707*preconditioned* system. Given a linear system, :math:`Ax =b` and a
708preconditioner :math:`M` the preconditioned system is given by
709:math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as
710Preconditioned Conjugate Gradients algorithm (PCG) and its worst case
711complexity now depends on the condition number of the *preconditioned*
712matrix :math:`\kappa(M^{-1}A)`.
713
714The computational cost of using a preconditioner :math:`M` is the cost
715of computing :math:`M` and evaluating the product :math:`M^{-1}y` for
716arbitrary vectors :math:`y`. Thus, there are two competing factors to
717consider: How much of :math:`H`'s structure is captured by :math:`M`
718so that the condition number :math:`\kappa(HM^{-1})` is low, and the
719computational cost of constructing and using :math:`M`. The ideal
720preconditioner would be one for which :math:`\kappa(M^{-1}A)
721=1`. :math:`M=A` achieves this, but it is not a practical choice, as
722applying this preconditioner would require solving a linear system
723equivalent to the unpreconditioned problem. It is usually the case
724that the more information :math:`M` has about :math:`H`, the more
725expensive it is use. For example, Incomplete Cholesky factorization
726based preconditioners have much better convergence behavior than the
727Jacobi preconditioner, but are also much more expensive.
728
729The simplest of all preconditioners is the diagonal or Jacobi
730preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
731block structured matrices like :math:`H` can be generalized to the
732block Jacobi preconditioner. Ceres implements the block Jacobi
733preconditioner and refers to it as ``JACOBI``. When used with
734:ref:`section-cgnr` it refers to the block diagonal of :math:`H` and
735when used with :ref:`section-iterative_schur` it refers to the block
736diagonal of :math:`B` [Mandel]_.
737
738Another obvious choice for :ref:`section-iterative_schur` is the block
739diagonal of the Schur complement matrix :math:`S`, i.e, the block
740Jacobi preconditioner for :math:`S`. Ceres implements it and refers to
741is as the ``SCHUR_JACOBI`` preconditioner.
742
743For bundle adjustment problems arising in reconstruction from
744community photo collections, more effective preconditioners can be
745constructed by analyzing and exploiting the camera-point visibility
746structure of the scene [KushalAgarwal]_. Ceres implements the two
747visibility based preconditioners described by Kushal & Agarwal as
748``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new
749preconditioners and Ceres' implementation of them is in its early
750stages and is not as mature as the other preconditioners described
751above.
752
753.. _section-ordering:
754
755Ordering
756--------
757
758The order in which variables are eliminated in a linear solver can
759have a significant of impact on the efficiency and accuracy of the
760method. For example when doing sparse Cholesky factorization, there
761are matrices for which a good ordering will give a Cholesky factor
762with :math:`O(n)` storage, where as a bad ordering will result in an
763completely dense factor.
764
765Ceres allows the user to provide varying amounts of hints to the
766solver about the variable elimination ordering to use. This can range
767from no hints, where the solver is free to decide the best ordering
768based on the user's choices like the linear solver being used, to an
769exact order in which the variables should be eliminated, and a variety
770of possibilities in between.
771
772Instances of the :class:`ParameterBlockOrdering` class are used to
773communicate this information to Ceres.
774
775Formally an ordering is an ordered partitioning of the parameter
776blocks. Each parameter block belongs to exactly one group, and each
777group has a unique integer associated with it, that determines its
778order in the set of groups. We call these groups *Elimination Groups*
779
780Given such an ordering, Ceres ensures that the parameter blocks in the
781lowest numbered elimination group are eliminated first, and then the
782parameter blocks in the next lowest numbered elimination group and so
783on. Within each elimination group, Ceres is free to order the
784parameter blocks as it chooses. For example, consider the linear system
785
786.. math::
787 x + y &= 3\\
788 2x + 3y &= 7
789
790There are two ways in which it can be solved. First eliminating
791:math:`x` from the two equations, solving for :math:`y` and then back
792substituting for :math:`x`, or first eliminating :math:`y`, solving
793for :math:`x` and back substituting for :math:`y`. The user can
794construct three orderings here.
795
7961. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
7972. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
7983. :math:`\{0: x, y\}` : Solver gets to decide the elimination order.
799
800Thus, to have Ceres determine the ordering automatically using
801heuristics, put all the variables in the same elimination group. The
802identity of the group does not matter. This is the same as not
803specifying an ordering at all. To control the ordering for every
804variable, create an elimination group per variable, ordering them in
805the desired order.
806
807If the user is using one of the Schur solvers (``DENSE_SCHUR``,
808``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
809ordering, it must have one important property. The lowest numbered
810elimination group must form an independent set in the graph
811corresponding to the Hessian, or in other words, no two parameter
812blocks in in the first elimination group should co-occur in the same
813residual block. For the best performance, this elimination group
814should be as large as possible. For standard bundle adjustment
815problems, this corresponds to the first elimination group containing
816all the 3d points, and the second containing the all the cameras
817parameter blocks.
818
819If the user leaves the choice to Ceres, then the solver uses an
820approximate maximum independent set algorithm to identify the first
821elimination group [LiSaad]_.
822
823.. _section-solver-options:
824
825:class:`Solver::Options`
826========================
827
828.. class:: Solver::Options
829
830 :class:`Solver::Options` controls the overall behavior of the
831 solver. We list the various settings and their default values below.
832
833.. function:: bool Solver::Options::IsValid(string* error) const
834
835 Validate the values in the options struct and returns true on
836 success. If there is a problem, the method returns false with
837 ``error`` containing a textual description of the cause.
838
839.. member:: MinimizerType Solver::Options::minimizer_type
840
841 Default: ``TRUST_REGION``
842
843 Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See
844 :ref:`section-trust-region-methods` and
845 :ref:`section-line-search-methods` for more details.
846
847.. member:: LineSearchDirectionType Solver::Options::line_search_direction_type
848
849 Default: ``LBFGS``
850
851 Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
852 ``BFGS`` and ``LBFGS``.
853
854.. member:: LineSearchType Solver::Options::line_search_type
855
856 Default: ``WOLFE``
857
858 Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
859 Note that in order for the assumptions underlying the ``BFGS`` and
860 ``LBFGS`` line search direction algorithms to be guaranteed to be
861 satisifed, the ``WOLFE`` line search should be used.
862
863.. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
864
865 Default: ``FLETCHER_REEVES``
866
867 Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIERE`` and
868 ``HESTENES_STIEFEL``.
869
870.. member:: int Solver::Options::max_lbfgs_rank
871
872 Default: 20
873
874 The L-BFGS hessian approximation is a low rank approximation to the
875 inverse of the Hessian matrix. The rank of the approximation
876 determines (linearly) the space and time complexity of using the
877 approximation. Higher the rank, the better is the quality of the
878 approximation. The increase in quality is however is bounded for a
879 number of reasons.
880
881 1. The method only uses secant information and not actual
882 derivatives.
883
884 2. The Hessian approximation is constrained to be positive
885 definite.
886
887 So increasing this rank to a large number will cost time and space
888 complexity without the corresponding increase in solution
889 quality. There are no hard and fast rules for choosing the maximum
890 rank. The best choice usually requires some problem specific
891 experimentation.
892
893.. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
894
895 Default: ``false``
896
897 As part of the ``BFGS`` update step / ``LBFGS`` right-multiply
898 step, the initial inverse Hessian approximation is taken to be the
899 Identity. However, [Oren]_ showed that using instead :math:`I *
900 \gamma`, where :math:`\gamma` is a scalar chosen to approximate an
901 eigenvalue of the true inverse Hessian can result in improved
902 convergence in a wide variety of cases. Setting
903 ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this
904 scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each
905 iteration).
906
907 Precisely, approximate eigenvalue scaling equates to
908
909 .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k}
910
911 With:
912
913 .. math:: y_k = \nabla f_{k+1} - \nabla f_k
914 .. math:: s_k = x_{k+1} - x_k
915
916 Where :math:`f()` is the line search objective and :math:`x` the
917 vector of parameter values [NocedalWright]_.
918
919 It is important to note that approximate eigenvalue scaling does
920 **not** *always* improve convergence, and that it can in fact
921 *significantly* degrade performance for certain classes of problem,
922 which is why it is disabled by default. In particular it can
923 degrade performance when the sensitivity of the problem to different
924 parameters varies significantly, as in this case a single scalar
925 factor fails to capture this variation and detrimentally downscales
926 parts of the Jacobian approximation which correspond to
927 low-sensitivity parameters. It can also reduce the robustness of the
928 solution to errors in the Jacobians.
929
930.. member:: LineSearchIterpolationType Solver::Options::line_search_interpolation_type
931
932 Default: ``CUBIC``
933
934 Degree of the polynomial used to approximate the objective
935 function. Valid values are ``BISECTION``, ``QUADRATIC`` and
936 ``CUBIC``.
937
938.. member:: double Solver::Options::min_line_search_step_size
939
940 The line search terminates if:
941
942 .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
943
944 where :math:`\|\cdot\|_\infty` refers to the max norm, and
945 :math:`\Delta x_k` is the step change in the parameter values at
946 the :math:`k`-th iteration.
947
948.. member:: double Solver::Options::line_search_sufficient_function_decrease
949
950 Default: ``1e-4``
951
952 Solving the line search problem exactly is computationally
953 prohibitive. Fortunately, line search based optimization algorithms
954 can still guarantee convergence if instead of an exact solution,
955 the line search algorithm returns a solution which decreases the
956 value of the objective function sufficiently. More precisely, we
957 are looking for a step size s.t.
958
959 .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}]
960
961 This condition is known as the Armijo condition.
962
963.. member:: double Solver::Options::max_line_search_step_contraction
964
965 Default: ``1e-3``
966
967 In each iteration of the line search,
968
969 .. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size}
970
971 Note that by definition, for contraction:
972
973 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
974
975.. member:: double Solver::Options::min_line_search_step_contraction
976
977 Default: ``0.6``
978
979 In each iteration of the line search,
980
981 .. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size}
982
983 Note that by definition, for contraction:
984
985 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1
986
987.. member:: int Solver::Options::max_num_line_search_step_size_iterations
988
989 Default: ``20``
990
991 Maximum number of trial step size iterations during each line
992 search, if a step size satisfying the search conditions cannot be
993 found within this number of trials, the line search will stop.
994
995 As this is an 'artificial' constraint (one imposed by the user, not
996 the underlying math), if ``WOLFE`` line search is being used, *and*
997 points satisfying the Armijo sufficient (function) decrease
998 condition have been found during the current search (in :math:`<=`
999 ``max_num_line_search_step_size_iterations``). Then, the step size
1000 with the lowest function value which satisfies the Armijo condition
1001 will be returned as the new valid step, even though it does *not*
1002 satisfy the strong Wolfe conditions. This behaviour protects
1003 against early termination of the optimizer at a sub-optimal point.
1004
1005.. member:: int Solver::Options::max_num_line_search_direction_restarts
1006
1007 Default: ``5``
1008
1009 Maximum number of restarts of the line search direction algorithm
1010 before terminating the optimization. Restarts of the line search
1011 direction algorithm occur when the current algorithm fails to
1012 produce a new descent direction. This typically indicates a
1013 numerical failure, or a breakdown in the validity of the
1014 approximations used.
1015
1016.. member:: double Solver::Options::line_search_sufficient_curvature_decrease
1017
1018 Default: ``0.9``
1019
1020 The strong Wolfe conditions consist of the Armijo sufficient
1021 decrease condition, and an additional requirement that the
1022 step size be chosen s.t. the *magnitude* ('strong' Wolfe
1023 conditions) of the gradient along the search direction
1024 decreases sufficiently. Precisely, this second condition
1025 is that we seek a step size s.t.
1026
1027 .. math:: \|f'(\text{step_size})\| <= \text{sufficient_curvature_decrease} * \|f'(0)\|
1028
1029 Where :math:`f()` is the line search objective and :math:`f'()` is the derivative
1030 of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`.
1031
1032.. member:: double Solver::Options::max_line_search_step_expansion
1033
1034 Default: ``10.0``
1035
1036 During the bracketing phase of a Wolfe line search, the step size
1037 is increased until either a point satisfying the Wolfe conditions
1038 is found, or an upper bound for a bracket containing a point
1039 satisfying the conditions is found. Precisely, at each iteration
1040 of the expansion:
1041
1042 .. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size}
1043
1044 By definition for expansion
1045
1046 .. math:: \text{max_step_expansion} > 1.0
1047
1048.. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type
1049
1050 Default: ``LEVENBERG_MARQUARDT``
1051
1052 The trust region step computation algorithm used by
1053 Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two
1054 valid choices. See :ref:`section-levenberg-marquardt` and
1055 :ref:`section-dogleg` for more details.
1056
1057.. member:: DoglegType Solver::Options::dogleg_type
1058
1059 Default: ``TRADITIONAL_DOGLEG``
1060
1061 Ceres supports two different dogleg strategies.
1062 ``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG``
1063 method described by [ByrdSchnabel]_ . See :ref:`section-dogleg`
1064 for more details.
1065
1066.. member:: bool Solver::Options::use_nonmonotonic_steps
1067
1068 Default: ``false``
1069
1070 Relax the requirement that the trust-region algorithm take strictly
1071 decreasing steps. See :ref:`section-non-monotonic-steps` for more
1072 details.
1073
1074.. member:: int Solver::Options::max_consecutive_nonmonotonic_steps
1075
1076 Default: ``5``
1077
1078 The window size used by the step selection algorithm to accept
1079 non-monotonic steps.
1080
1081.. member:: int Solver::Options::max_num_iterations
1082
1083 Default: ``50``
1084
1085 Maximum number of iterations for which the solver should run.
1086
1087.. member:: double Solver::Options::max_solver_time_in_seconds
1088
1089 Default: ``1e6``
1090 Maximum amount of time for which the solver should run.
1091
1092.. member:: int Solver::Options::num_threads
1093
1094 Default: ``1``
1095
1096 Number of threads used by Ceres to evaluate the Jacobian.
1097
1098.. member:: double Solver::Options::initial_trust_region_radius
1099
1100 Default: ``1e4``
1101
1102 The size of the initial trust region. When the
1103 ``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this
1104 number is the initial regularization parameter.
1105
1106.. member:: double Solver::Options::max_trust_region_radius
1107
1108 Default: ``1e16``
1109
1110 The trust region radius is not allowed to grow beyond this value.
1111
1112.. member:: double Solver::Options::min_trust_region_radius
1113
1114 Default: ``1e-32``
1115
1116 The solver terminates, when the trust region becomes smaller than
1117 this value.
1118
1119.. member:: double Solver::Options::min_relative_decrease
1120
1121 Default: ``1e-3``
1122
1123 Lower threshold for relative decrease before a trust-region step is
1124 accepted.
1125
1126.. member:: double Solver::Options::min_lm_diagonal
1127
1128 Default: ``1e6``
1129
1130 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1131 regularize the trust region step. This is the lower bound on
1132 the values of this diagonal matrix.
1133
1134.. member:: double Solver::Options::max_lm_diagonal
1135
1136 Default: ``1e32``
1137
1138 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to
1139 regularize the trust region step. This is the upper bound on
1140 the values of this diagonal matrix.
1141
1142.. member:: int Solver::Options::max_num_consecutive_invalid_steps
1143
1144 Default: ``5``
1145
1146 The step returned by a trust region strategy can sometimes be
1147 numerically invalid, usually because of conditioning
1148 issues. Instead of crashing or stopping the optimization, the
1149 optimizer can go ahead and try solving with a smaller trust
1150 region/better conditioned problem. This parameter sets the number
1151 of consecutive retries before the minimizer gives up.
1152
1153.. member:: double Solver::Options::function_tolerance
1154
1155 Default: ``1e-6``
1156
1157 Solver terminates if
1158
1159 .. math:: \frac{|\Delta \text{cost}|}{\text{cost}} <= \text{function_tolerance}
1160
1161 where, :math:`\Delta \text{cost}` is the change in objective
1162 function value (up or down) in the current iteration of
1163 Levenberg-Marquardt.
1164
1165.. member:: double Solver::Options::gradient_tolerance
1166
1167 Default: ``1e-10``
1168
1169 Solver terminates if
1170
1171 .. math:: \|x - \Pi \boxplus(x, -g(x))\|_\infty <= \text{gradient_tolerance}
1172
1173 where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
1174 is projection onto the bounds constraints and :math:`\boxplus` is
1175 Plus operation for the overall local parameterization associated
1176 with the parameter vector.
1177
1178.. member:: double Solver::Options::parameter_tolerance
1179
1180 Default: ``1e-8``
1181
1182 Solver terminates if
1183
1184 .. math:: \|\Delta x\| <= (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance}
1185
1186 where :math:`\Delta x` is the step computed by the linear solver in
1187 the current iteration.
1188
1189.. member:: LinearSolverType Solver::Options::linear_solver_type
1190
1191 Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR``
1192
1193 Type of linear solver used to compute the solution to the linear
1194 least squares problem in each iteration of the Levenberg-Marquardt
1195 algorithm. If Ceres is built with support for ``SuiteSparse`` or
1196 ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
1197 default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
1198 otherwise.
1199
1200.. member:: PreconditionerType Solver::Options::preconditioner_type
1201
1202 Default: ``JACOBI``
1203
1204 The preconditioner used by the iterative linear solver. The default
1205 is the block Jacobi preconditioner. Valid values are (in increasing
1206 order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
1207 ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
1208 :ref:`section-preconditioner` for more details.
1209
1210.. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
1211
1212 Default: ``CANONICAL_VIEWS``
1213
1214 Type of clustering algorithm to use when constructing a visibility
1215 based preconditioner. The original visibility based preconditioning
1216 paper and implementation only used the canonical views algorithm.
1217
1218 This algorithm gives high quality results but for large dense
1219 graphs can be particularly expensive. As its worst case complexity
1220 is cubic in size of the graph.
1221
1222 Another option is to use ``SINGLE_LINKAGE`` which is a simple
1223 thresholded single linkage clustering algorithm that only pays
1224 attention to tightly coupled blocks in the Schur complement. This
1225 is a fast algorithm that works well.
1226
1227 The optimal choice of the clustering algorithm depends on the
1228 sparsity structure of the problem, but generally speaking we
1229 recommend that you try ``CANONICAL_VIEWS`` first and if it is too
1230 expensive try ``SINGLE_LINKAGE``.
1231
1232.. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
1233
1234 Default:``EIGEN``
1235
1236 Ceres supports using multiple dense linear algebra libraries for
1237 dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
1238 the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
1239 to the system ``BLAS + LAPACK`` library which may or may not be
1240 available.
1241
1242 This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
1243 and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
1244 ``EIGEN`` is a fine choice but for large problems, an optimized
1245 ``LAPACK + BLAS`` implementation can make a substantial difference
1246 in performance.
1247
1248.. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
1249
1250 Default: The highest available according to: ``SUITE_SPARSE`` >
1251 ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
1252
1253 Ceres supports the use of three sparse linear algebra libraries,
1254 ``SuiteSparse``, which is enabled by setting this parameter to
1255 ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
1256 this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
1257 setting this parameter to ``EIGEN_SPARSE``. Lastly, ``NO_SPARSE``
1258 means that no sparse linear solver should be used; note that this is
1259 irrespective of whether Ceres was compiled with support for one.
1260
1261 ``SuiteSparse`` is a sophisticated and complex sparse linear
1262 algebra library and should be used in general.
1263
1264 If your needs/platforms prevent you from using ``SuiteSparse``,
1265 consider using ``CXSparse``, which is a much smaller, easier to
1266 build library. As can be expected, its performance on large
1267 problems is not comparable to that of ``SuiteSparse``.
1268
1269 Last but not the least you can use the sparse linear algebra
1270 routines in ``Eigen``. Currently the performance of this library is
1271 the poorest of the three. But this should change in the near
1272 future.
1273
1274 Another thing to consider here is that the sparse Cholesky
1275 factorization libraries in Eigen are licensed under ``LGPL`` and
1276 building Ceres with support for ``EIGEN_SPARSE`` will result in an
1277 LGPL licensed library (since the corresponding code from Eigen is
1278 compiled into the library).
1279
1280 The upside is that you do not need to build and link to an external
1281 library to use ``EIGEN_SPARSE``.
1282
1283
1284.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
1285
1286 Default: ``NULL``
1287
1288 An instance of the ordering object informs the solver about the
1289 desired order in which parameter blocks should be eliminated by the
1290 linear solvers. See section~\ref{sec:ordering`` for more details.
1291
1292 If ``NULL``, the solver is free to choose an ordering that it
1293 thinks is best.
1294
1295 See :ref:`section-ordering` for more details.
1296
1297.. member:: bool Solver::Options::use_explicit_schur_complement
1298
1299 Default: ``false``
1300
1301 Use an explicitly computed Schur complement matrix with
1302 ``ITERATIVE_SCHUR``.
1303
1304 By default this option is disabled and ``ITERATIVE_SCHUR``
1305 evaluates evaluates matrix-vector products between the Schur
1306 complement and a vector implicitly by exploiting the algebraic
1307 expression for the Schur complement.
1308
1309 The cost of this evaluation scales with the number of non-zeros in
1310 the Jacobian.
1311
1312 For small to medium sized problems there is a sweet spot where
1313 computing the Schur complement is cheap enough that it is much more
1314 efficient to explicitly compute it and use it for evaluating the
1315 matrix-vector products.
1316
1317 Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
1318 computed Schur complement. This can improve the performance of the
1319 ``ITERATIVE_SCHUR`` solver significantly.
1320
1321 .. NOTE:
1322
1323 This option can only be used with the ``SCHUR_JACOBI``
1324 preconditioner.
1325
1326.. member:: bool Solver::Options::use_post_ordering
1327
1328 Default: ``false``
1329
1330 Sparse Cholesky factorization algorithms use a fill-reducing
1331 ordering to permute the columns of the Jacobian matrix. There are
1332 two ways of doing this.
1333
1334 1. Compute the Jacobian matrix in some order and then have the
1335 factorization algorithm permute the columns of the Jacobian.
1336
1337 2. Compute the Jacobian with its columns already permuted.
1338
1339 The first option incurs a significant memory penalty. The
1340 factorization algorithm has to make a copy of the permuted Jacobian
1341 matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
1342 and generally speaking, there is no performance penalty for doing
1343 so.
1344
1345 In some rare cases, it is worth using a more complicated reordering
1346 algorithm which has slightly better runtime performance at the
1347 expense of an extra copy of the Jacobian matrix. Setting
1348 ``use_postordering`` to ``true`` enables this tradeoff.
1349
1350.. member:: bool Solver::Options::dynamic_sparsity
1351
1352 Some non-linear least squares problems are symbolically dense but
1353 numerically sparse. i.e. at any given state only a small number of
1354 Jacobian entries are non-zero, but the position and number of
1355 non-zeros is different depending on the state. For these problems
1356 it can be useful to factorize the sparse jacobian at each solver
1357 iteration instead of including all of the zero entries in a single
1358 general factorization.
1359
1360 If your problem does not have this property (or you do not know),
1361 then it is probably best to keep this false, otherwise it will
1362 likely lead to worse performance.
1363
1364 This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
1365
1366.. member:: int Solver::Options::min_linear_solver_iterations
1367
1368 Default: ``0``
1369
1370 Minimum number of iterations used by the linear solver. This only
1371 makes sense when the linear solver is an iterative solver, e.g.,
1372 ``ITERATIVE_SCHUR`` or ``CGNR``.
1373
1374.. member:: int Solver::Options::max_linear_solver_iterations
1375
1376 Default: ``500``
1377
1378 Minimum number of iterations used by the linear solver. This only
1379 makes sense when the linear solver is an iterative solver, e.g.,
1380 ``ITERATIVE_SCHUR`` or ``CGNR``.
1381
1382.. member:: double Solver::Options::eta
1383
1384 Default: ``1e-1``
1385
1386 Forcing sequence parameter. The truncated Newton solver uses this
1387 number to control the relative accuracy with which the Newton step
1388 is computed. This constant is passed to
1389 ``ConjugateGradientsSolver`` which uses it to terminate the
1390 iterations when
1391
1392 .. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i}
1393
1394.. member:: bool Solver::Options::jacobi_scaling
1395
1396 Default: ``true``
1397
1398 ``true`` means that the Jacobian is scaled by the norm of its
1399 columns before being passed to the linear solver. This improves the
1400 numerical conditioning of the normal equations.
1401
1402.. member:: bool Solver::Options::use_inner_iterations
1403
1404 Default: ``false``
1405
1406 Use a non-linear version of a simplified variable projection
1407 algorithm. Essentially this amounts to doing a further optimization
1408 on each Newton/Trust region step using a coordinate descent
1409 algorithm. For more details, see :ref:`section-inner-iterations`.
1410
1411.. member:: double Solver::Options::inner_iteration_tolerance
1412
1413 Default: ``1e-3``
1414
1415 Generally speaking, inner iterations make significant progress in
1416 the early stages of the solve and then their contribution drops
1417 down sharply, at which point the time spent doing inner iterations
1418 is not worth it.
1419
1420 Once the relative decrease in the objective function due to inner
1421 iterations drops below ``inner_iteration_tolerance``, the use of
1422 inner iterations in subsequent trust region minimizer iterations is
1423 disabled.
1424
1425.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
1426
1427 Default: ``NULL``
1428
1429 If :member:`Solver::Options::use_inner_iterations` true, then the
1430 user has two choices.
1431
1432 1. Let the solver heuristically decide which parameter blocks to
1433 optimize in each inner iteration. To do this, set
1434 :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
1435
1436 2. Specify a collection of of ordered independent sets. The lower
1437 numbered groups are optimized before the higher number groups
1438 during the inner optimization phase. Each group must be an
1439 independent set. Not all parameter blocks need to be included in
1440 the ordering.
1441
1442 See :ref:`section-ordering` for more details.
1443
1444.. member:: LoggingType Solver::Options::logging_type
1445
1446 Default: ``PER_MINIMIZER_ITERATION``
1447
1448.. member:: bool Solver::Options::minimizer_progress_to_stdout
1449
1450 Default: ``false``
1451
1452 By default the :class:`Minimizer` progress is logged to ``STDERR``
1453 depending on the ``vlog`` level. If this flag is set to true, and
1454 :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
1455 output is sent to ``STDOUT``.
1456
1457 For ``TRUST_REGION_MINIMIZER`` the progress display looks like
1458
1459 .. code-block:: bash
1460
1461 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
1462 0 4.185660e+06 0.00e+00 1.09e+08 0.00e+00 0.00e+00 1.00e+04 0 7.59e-02 3.37e-01
1463 1 1.062590e+05 4.08e+06 8.99e+06 5.36e+02 9.82e-01 3.00e+04 1 1.65e-01 5.03e-01
1464 2 4.992817e+04 5.63e+04 8.32e+06 3.19e+02 6.52e-01 3.09e+04 1 1.45e-01 6.48e-01
1465
1466 Here
1467
1468 #. ``cost`` is the value of the objective function.
1469 #. ``cost_change`` is the change in the value of the objective
1470 function if the step computed in this iteration is accepted.
1471 #. ``|gradient|`` is the max norm of the gradient.
1472 #. ``|step|`` is the change in the parameter vector.
1473 #. ``tr_ratio`` is the ratio of the actual change in the objective
1474 function value to the change in the value of the trust
1475 region model.
1476 #. ``tr_radius`` is the size of the trust region radius.
1477 #. ``ls_iter`` is the number of linear solver iterations used to
1478 compute the trust region step. For direct/factorization based
1479 solvers it is always 1, for iterative solvers like
1480 ``ITERATIVE_SCHUR`` it is the number of iterations of the
1481 Conjugate Gradients algorithm.
1482 #. ``iter_time`` is the time take by the current iteration.
1483 #. ``total_time`` is the total time taken by the minimizer.
1484
1485 For ``LINE_SEARCH_MINIMIZER`` the progress display looks like
1486
1487 .. code-block:: bash
1488
1489 0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02
1490 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01
1491 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01
1492
1493 Here
1494
1495 #. ``f`` is the value of the objective function.
1496 #. ``d`` is the change in the value of the objective function if
1497 the step computed in this iteration is accepted.
1498 #. ``g`` is the max norm of the gradient.
1499 #. ``h`` is the change in the parameter vector.
1500 #. ``s`` is the optimal step length computed by the line search.
1501 #. ``it`` is the time take by the current iteration.
1502 #. ``tt`` is the total time taken by the minimizer.
1503
1504.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
1505
1506 Default: ``empty``
1507
1508 List of iterations at which the trust region minimizer should dump
1509 the trust region problem. Useful for testing and benchmarking. If
1510 ``empty``, no problems are dumped.
1511
1512.. member:: string Solver::Options::trust_region_problem_dump_directory
1513
1514 Default: ``/tmp``
1515
1516 Directory to which the problems should be written to. Should be
1517 non-empty if
1518 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump` is
1519 non-empty and
1520 :member:`Solver::Options::trust_region_problem_dump_format_type` is not
1521 ``CONSOLE``.
1522
1523.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
1524
1525 Default: ``TEXTFILE``
1526
1527 The format in which trust region problems should be logged when
1528 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump`
1529 is non-empty. There are three options:
1530
1531 * ``CONSOLE`` prints the linear least squares problem in a human
1532 readable format to ``stderr``. The Jacobian is printed as a
1533 dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are
1534 printed as dense vectors. This should only be used for small
1535 problems.
1536
1537 * ``TEXTFILE`` Write out the linear least squares problem to the
1538 directory pointed to by
1539 :member:`Solver::Options::trust_region_problem_dump_directory` as
1540 text files which can be read into ``MATLAB/Octave``. The Jacobian
1541 is dumped as a text file containing :math:`(i,j,s)` triplets, the
1542 vectors :math:`D`, `x` and `f` are dumped as text files
1543 containing a list of their values.
1544
1545 A ``MATLAB/Octave`` script called
1546 ``ceres_solver_iteration_???.m`` is also output, which can be
1547 used to parse and load the problem into memory.
1548
1549.. member:: bool Solver::Options::check_gradients
1550
1551 Default: ``false``
1552
1553 Check all Jacobians computed by each residual block with finite
1554 differences. This is expensive since it involves computing the
1555 derivative by normal means (e.g. user specified, autodiff, etc),
1556 then also computing it using finite differences. The results are
1557 compared, and if they differ substantially, the optimization fails
1558 and the details are stored in the solver summary.
1559
1560.. member:: double Solver::Options::gradient_check_relative_precision
1561
1562 Default: ``1e08``
1563
1564 Precision to check for in the gradient checker. If the relative
1565 difference between an element in a Jacobian exceeds this number,
1566 then the Jacobian for that cost term is dumped.
1567
1568.. member:: double Solver::Options::gradient_check_numeric_derivative_relative_step_size
1569
1570 Default: ``1e-6``
1571
1572 .. NOTE::
1573
1574 This option only applies to the numeric differentiation used for
1575 checking the user provided derivatives when when
1576 `Solver::Options::check_gradients` is true. If you are using
1577 :class:`NumericDiffCostFunction` and are interested in changing
1578 the step size for numeric differentiation in your cost function,
1579 please have a look at :class:`NumericDiffOptions`.
1580
1581 Relative shift used for taking numeric derivatives when
1582 `Solver::Options::check_gradients` is `true`.
1583
1584 For finite differencing, each dimension is evaluated at slightly
1585 shifted values, e.g., for forward differences, the numerical
1586 derivative is
1587
1588 .. math::
1589
1590 \delta &= gradient\_check\_numeric\_derivative\_relative\_step\_size\\
1591 \Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x}
1592
1593 The finite differencing is done along each dimension. The reason to
1594 use a relative (rather than absolute) step size is that this way,
1595 numeric differentiation works for functions where the arguments are
1596 typically large (e.g. :math:`10^9`) and when the values are small
1597 (e.g. :math:`10^{-5}`). It is possible to construct *torture cases*
1598 which break this finite difference heuristic, but they do not come
1599 up often in practice.
1600
1601.. member:: vector<IterationCallback> Solver::Options::callbacks
1602
1603 Callbacks that are executed at the end of each iteration of the
1604 :class:`Minimizer`. They are executed in the order that they are
1605 specified in this vector. By default, parameter blocks are updated
1606 only at the end of the optimization, i.e., when the
1607 :class:`Minimizer` terminates. This behavior is controlled by
1608 :member:`Solver::Options::update_state_every_iteration`. If the user
1609 wishes to have access to the updated parameter blocks when his/her
1610 callbacks are executed, then set
1611 :member:`Solver::Options::update_state_every_iteration` to true.
1612
1613 The solver does NOT take ownership of these pointers.
1614
1615.. member:: bool Solver::Options::update_state_every_iteration
1616
1617 Default: ``false``
1618
1619 If true, the user's parameter blocks are updated at the end of
1620 every Minimizer iteration, otherwise they are updated when the
1621 Minimizer terminates. This is useful if, for example, the user
1622 wishes to visualize the state of the optimization every iteration
1623 (in combination with an IterationCallback).
1624
1625 **Note**: If :member:`Solver::Options::evaluation_callback` is set,
1626 then the behaviour of this flag is slightly different in each case:
1627
1628 1. If :member:`Solver::Options::update_state_every_iteration` is
1629 false, then the user's state is changed at every residual and/or
1630 jacobian evaluation. Any user provided IterationCallbacks should
1631 **not** inspect and depend on the user visible state while the
1632 solver is running, since they it have undefined contents.
1633
1634 2. If :member:`Solver::Options::update_state_every_iteration` is
1635 false, then the user's state is changed at every residual and/or
1636 jacobian evaluation, BUT the solver will ensure that before the
1637 user provided `IterationCallbacks` are called, the user visible
1638 state will be updated to the current best point found by the
1639 solver.
1640
1641.. member:: bool Solver::Options::evaluation_callback
1642
1643 Default: ``NULL``
1644
1645 If non-``NULL``, gets notified when Ceres is about to evaluate the
1646 residuals and/or Jacobians. This enables sharing computation between
1647 residuals, which in some cases is important for efficient cost
1648 evaluation. See :class:`EvaluationCallback` for details.
1649
1650 **Note**: Evaluation callbacks are incompatible with inner
1651 iterations.
1652
1653 **Warning**: This interacts with
1654 :member:`Solver::Options::update_state_every_iteration`. See the
1655 documentation for that option for more details.
1656
1657 The solver does `not` take ownership of the pointer.
1658
1659:class:`ParameterBlockOrdering`
1660===============================
1661
1662.. class:: ParameterBlockOrdering
1663
1664 ``ParameterBlockOrdering`` is a class for storing and manipulating
1665 an ordered collection of groups/sets with the following semantics:
1666
1667 Group IDs are non-negative integer values. Elements are any type
1668 that can serve as a key in a map or an element of a set.
1669
1670 An element can only belong to one group at a time. A group may
1671 contain an arbitrary number of elements.
1672
1673 Groups are ordered by their group id.
1674
1675.. function:: bool ParameterBlockOrdering::AddElementToGroup(const double* element, const int group)
1676
1677 Add an element to a group. If a group with this id does not exist,
1678 one is created. This method can be called any number of times for
1679 the same element. Group ids should be non-negative numbers. Return
1680 value indicates if adding the element was a success.
1681
1682.. function:: void ParameterBlockOrdering::Clear()
1683
1684 Clear the ordering.
1685
1686.. function:: bool ParameterBlockOrdering::Remove(const double* element)
1687
1688 Remove the element, no matter what group it is in. If the element
1689 is not a member of any group, calling this method will result in a
1690 crash. Return value indicates if the element was actually removed.
1691
1692.. function:: void ParameterBlockOrdering::Reverse()
1693
1694 Reverse the order of the groups in place.
1695
1696.. function:: int ParameterBlockOrdering::GroupId(const double* element) const
1697
1698 Return the group id for the element. If the element is not a member
1699 of any group, return -1.
1700
1701.. function:: bool ParameterBlockOrdering::IsMember(const double* element) const
1702
1703 True if there is a group containing the parameter block.
1704
1705.. function:: int ParameterBlockOrdering::GroupSize(const int group) const
1706
1707 This function always succeeds, i.e., implicitly there exists a
1708 group for every integer.
1709
1710.. function:: int ParameterBlockOrdering::NumElements() const
1711
1712 Number of elements in the ordering.
1713
1714.. function:: int ParameterBlockOrdering::NumGroups() const
1715
1716 Number of groups with one or more elements.
1717
1718:class:`EvaluationCallback`
1719===========================
1720
1721.. class:: EvaluationCallback
1722
1723 Interface for receiving callbacks before Ceres evaluates residuals or
1724 Jacobians:
1725
1726 .. code-block:: c++
1727
1728 class EvaluationCallback {
1729 public:
1730 virtual ~EvaluationCallback() {}
1731 virtual void PrepareForEvaluation()(bool evaluate_jacobians
1732 bool new_evaluation_point) = 0;
1733 };
1734
1735 ``PrepareForEvaluation()`` is called before Ceres requests residuals
1736 or jacobians for a given setting of the parameters. User parameters
1737 (the double* values provided to the cost functions) are fixed until
1738 the next call to ``PrepareForEvaluation()``. If
1739 ``new_evaluation_point == true``, then this is a new point that is
1740 different from the last evaluated point. Otherwise, it is the same
1741 point that was evaluated previously (either jacobian or residual) and
1742 the user can use cached results from previous evaluations. If
1743 ``evaluate_jacobians`` is true, then Ceres will request jacobians in
1744 the upcoming cost evaluation.
1745
1746 Using this callback interface, Ceres can notify you when it is about
1747 to evaluate the residuals or jacobians. With the callback, you can
1748 share computation between residual blocks by doing the shared
1749 computation in PrepareForEvaluation() before Ceres calls
1750 CostFunction::Evaluate() on all the residuals. It also enables
1751 caching results between a pure residual evaluation and a residual &
1752 jacobian evaluation, via the new_evaluation_point argument.
1753
1754 One use case for this callback is if the cost function compute is
1755 moved to the GPU. In that case, the prepare call does the actual cost
1756 function evaluation, and subsequent calls from Ceres to the actual
1757 cost functions merely copy the results from the GPU onto the
1758 corresponding blocks for Ceres to plug into the solver.
1759
1760 **Note**: Ceres provides no mechanism to share data other than the
1761 notification from the callback. Users must provide access to
1762 pre-computed shared data to their cost functions behind the scenes;
1763 this all happens without Ceres knowing. One approach is to put a
1764 pointer to the shared data in each cost function (recommended) or to
1765 use a global shared variable (discouraged; bug-prone). As far as
1766 Ceres is concerned, it is evaluating cost functions like any other;
1767 it just so happens that behind the scenes the cost functions reuse
1768 pre-computed data to execute faster.
1769
1770 See ``evaluation_callback_test.cc`` for code that explicitly verifies
1771 the preconditions between ``PrepareForEvaluation()`` and
1772 ``CostFunction::Evaluate()``.
1773
1774:class:`IterationCallback`
1775==========================
1776
1777.. class:: IterationSummary
1778
1779 :class:`IterationSummary` describes the state of the minimizer at
1780 the end of each iteration.
1781
1782.. member:: int32 IterationSummary::iteration
1783
1784 Current iteration number.
1785
1786.. member:: bool IterationSummary::step_is_valid
1787
1788 Step was numerically valid, i.e., all values are finite and the
1789 step reduces the value of the linearized model.
1790
1791 **Note**: :member:`IterationSummary::step_is_valid` is `false`
1792 when :member:`IterationSummary::iteration` = 0.
1793
1794.. member:: bool IterationSummary::step_is_nonmonotonic
1795
1796 Step did not reduce the value of the objective function
1797 sufficiently, but it was accepted because of the relaxed
1798 acceptance criterion used by the non-monotonic trust region
1799 algorithm.
1800
1801 **Note**: :member:`IterationSummary::step_is_nonmonotonic` is
1802 `false` when when :member:`IterationSummary::iteration` = 0.
1803
1804.. member:: bool IterationSummary::step_is_successful
1805
1806 Whether or not the minimizer accepted this step or not.
1807
1808 If the ordinary trust region algorithm is used, this means that the
1809 relative reduction in the objective function value was greater than
1810 :member:`Solver::Options::min_relative_decrease`. However, if the
1811 non-monotonic trust region algorithm is used
1812 (:member:`Solver::Options::use_nonmonotonic_steps` = `true`), then
1813 even if the relative decrease is not sufficient, the algorithm may
1814 accept the step and the step is declared successful.
1815
1816 **Note**: :member:`IterationSummary::step_is_successful` is `false`
1817 when when :member:`IterationSummary::iteration` = 0.
1818
1819.. member:: double IterationSummary::cost
1820
1821 Value of the objective function.
1822
1823.. member:: double IterationSummary::cost_change
1824
1825 Change in the value of the objective function in this
1826 iteration. This can be positive or negative.
1827
1828.. member:: double IterationSummary::gradient_max_norm
1829
1830 Infinity norm of the gradient vector.
1831
1832.. member:: double IterationSummary::gradient_norm
1833
1834 2-norm of the gradient vector.
1835
1836.. member:: double IterationSummary::step_norm
1837
1838 2-norm of the size of the step computed in this iteration.
1839
1840.. member:: double IterationSummary::relative_decrease
1841
1842 For trust region algorithms, the ratio of the actual change in cost
1843 and the change in the cost of the linearized approximation.
1844
1845 This field is not used when a linear search minimizer is used.
1846
1847.. member:: double IterationSummary::trust_region_radius
1848
1849 Size of the trust region at the end of the current iteration. For
1850 the Levenberg-Marquardt algorithm, the regularization parameter is
1851 1.0 / member::`IterationSummary::trust_region_radius`.
1852
1853.. member:: double IterationSummary::eta
1854
1855 For the inexact step Levenberg-Marquardt algorithm, this is the
1856 relative accuracy with which the step is solved. This number is
1857 only applicable to the iterative solvers capable of solving linear
1858 systems inexactly. Factorization-based exact solvers always have an
1859 eta of 0.0.
1860
1861.. member:: double IterationSummary::step_size
1862
1863 Step sized computed by the line search algorithm.
1864
1865 This field is not used when a trust region minimizer is used.
1866
1867.. member:: int IterationSummary::line_search_function_evaluations
1868
1869 Number of function evaluations used by the line search algorithm.
1870
1871 This field is not used when a trust region minimizer is used.
1872
1873.. member:: int IterationSummary::linear_solver_iterations
1874
1875 Number of iterations taken by the linear solver to solve for the
1876 trust region step.
1877
1878 Currently this field is not used when a line search minimizer is
1879 used.
1880
1881.. member:: double IterationSummary::iteration_time_in_seconds
1882
1883 Time (in seconds) spent inside the minimizer loop in the current
1884 iteration.
1885
1886.. member:: double IterationSummary::step_solver_time_in_seconds
1887
1888 Time (in seconds) spent inside the trust region step solver.
1889
1890.. member:: double IterationSummary::cumulative_time_in_seconds
1891
1892 Time (in seconds) since the user called Solve().
1893
1894
1895.. class:: IterationCallback
1896
1897 Interface for specifying callbacks that are executed at the end of
1898 each iteration of the minimizer.
1899
1900 .. code-block:: c++
1901
1902 class IterationCallback {
1903 public:
1904 virtual ~IterationCallback() {}
1905 virtual CallbackReturnType operator()(const IterationSummary& summary) = 0;
1906 };
1907
1908
1909 The solver uses the return value of ``operator()`` to decide whether
1910 to continue solving or to terminate. The user can return three
1911 values.
1912
1913 #. ``SOLVER_ABORT`` indicates that the callback detected an abnormal
1914 situation. The solver returns without updating the parameter
1915 blocks (unless ``Solver::Options::update_state_every_iteration`` is
1916 set true). Solver returns with ``Solver::Summary::termination_type``
1917 set to ``USER_FAILURE``.
1918
1919 #. ``SOLVER_TERMINATE_SUCCESSFULLY`` indicates that there is no need
1920 to optimize anymore (some user specified termination criterion
1921 has been met). Solver returns with
1922 ``Solver::Summary::termination_type``` set to ``USER_SUCCESS``.
1923
1924 #. ``SOLVER_CONTINUE`` indicates that the solver should continue
1925 optimizing.
1926
1927 For example, the following :class:`IterationCallback` is used
1928 internally by Ceres to log the progress of the optimization.
1929
1930 .. code-block:: c++
1931
1932 class LoggingCallback : public IterationCallback {
1933 public:
1934 explicit LoggingCallback(bool log_to_stdout)
1935 : log_to_stdout_(log_to_stdout) {}
1936
1937 ~LoggingCallback() {}
1938
1939 CallbackReturnType operator()(const IterationSummary& summary) {
1940 const char* kReportRowFormat =
1941 "% 4d: f:% 8e d:% 3.2e g:% 3.2e h:% 3.2e "
1942 "rho:% 3.2e mu:% 3.2e eta:% 3.2e li:% 3d";
1943 string output = StringPrintf(kReportRowFormat,
1944 summary.iteration,
1945 summary.cost,
1946 summary.cost_change,
1947 summary.gradient_max_norm,
1948 summary.step_norm,
1949 summary.relative_decrease,
1950 summary.trust_region_radius,
1951 summary.eta,
1952 summary.linear_solver_iterations);
1953 if (log_to_stdout_) {
1954 cout << output << endl;
1955 } else {
1956 VLOG(1) << output;
1957 }
1958 return SOLVER_CONTINUE;
1959 }
1960
1961 private:
1962 const bool log_to_stdout_;
1963 };
1964
1965
1966
1967:class:`CRSMatrix`
1968==================
1969
1970.. class:: CRSMatrix
1971
1972 A compressed row sparse matrix used primarily for communicating the
1973 Jacobian matrix to the user.
1974
1975.. member:: int CRSMatrix::num_rows
1976
1977 Number of rows.
1978
1979.. member:: int CRSMatrix::num_cols
1980
1981 Number of columns.
1982
1983.. member:: vector<int> CRSMatrix::rows
1984
1985 :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
1986 sized array that points into the :member:`CRSMatrix::cols` and
1987 :member:`CRSMatrix::values` array.
1988
1989.. member:: vector<int> CRSMatrix::cols
1990
1991 :member:`CRSMatrix::cols` contain as many entries as there are
1992 non-zeros in the matrix.
1993
1994 For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
1995 are the indices of the non-zero columns of row ``i``.
1996
1997.. member:: vector<int> CRSMatrix::values
1998
1999 :member:`CRSMatrix::values` contain as many entries as there are
2000 non-zeros in the matrix.
2001
2002 For each row ``i``,
2003 ``values[rows[i]]`` ... ``values[rows[i + 1] - 1]`` are the values
2004 of the non-zero columns of row ``i``.
2005
2006e.g., consider the 3x4 sparse matrix
2007
2008.. code-block:: c++
2009
2010 0 10 0 4
2011 0 2 -3 2
2012 1 2 0 0
2013
2014The three arrays will be:
2015
2016.. code-block:: c++
2017
2018 -row0- ---row1--- -row2-
2019 rows = [ 0, 2, 5, 7]
2020 cols = [ 1, 3, 1, 2, 3, 0, 1]
2021 values = [10, 4, 2, -3, 2, 1, 2]
2022
2023
2024:class:`Solver::Summary`
2025========================
2026
2027.. class:: Solver::Summary
2028
2029 Summary of the various stages of the solver after termination.
2030
2031.. function:: string Solver::Summary::BriefReport() const
2032
2033 A brief one line description of the state of the solver after
2034 termination.
2035
2036.. function:: string Solver::Summary::FullReport() const
2037
2038 A full multiline description of the state of the solver after
2039 termination.
2040
2041.. function:: bool Solver::Summary::IsSolutionUsable() const
2042
2043 Whether the solution returned by the optimization algorithm can be
2044 relied on to be numerically sane. This will be the case if
2045 `Solver::Summary:termination_type` is set to `CONVERGENCE`,
2046 `USER_SUCCESS` or `NO_CONVERGENCE`, i.e., either the solver
2047 converged by meeting one of the convergence tolerances or because
2048 the user indicated that it had converged or it ran to the maximum
2049 number of iterations or time.
2050
2051.. member:: MinimizerType Solver::Summary::minimizer_type
2052
2053 Type of minimization algorithm used.
2054
2055.. member:: TerminationType Solver::Summary::termination_type
2056
2057 The cause of the minimizer terminating.
2058
2059.. member:: string Solver::Summary::message
2060
2061 Reason why the solver terminated.
2062
2063.. member:: double Solver::Summary::initial_cost
2064
2065 Cost of the problem (value of the objective function) before the
2066 optimization.
2067
2068.. member:: double Solver::Summary::final_cost
2069
2070 Cost of the problem (value of the objective function) after the
2071 optimization.
2072
2073.. member:: double Solver::Summary::fixed_cost
2074
2075 The part of the total cost that comes from residual blocks that
2076 were held fixed by the preprocessor because all the parameter
2077 blocks that they depend on were fixed.
2078
2079.. member:: vector<IterationSummary> Solver::Summary::iterations
2080
2081 :class:`IterationSummary` for each minimizer iteration in order.
2082
2083.. member:: int Solver::Summary::num_successful_steps
2084
2085 Number of minimizer iterations in which the step was
2086 accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
2087 is `true` this is also the number of steps in which the objective
2088 function value/cost went down.
2089
2090.. member:: int Solver::Summary::num_unsuccessful_steps
2091
2092 Number of minimizer iterations in which the step was rejected
2093 either because it did not reduce the cost enough or the step was
2094 not numerically valid.
2095
2096.. member:: int Solver::Summary::num_inner_iteration_steps
2097
2098 Number of times inner iterations were performed.
2099
2100 .. member:: int Solver::Summary::num_line_search_steps
2101
2102 Total number of iterations inside the line search algorithm across
2103 all invocations. We call these iterations "steps" to distinguish
2104 them from the outer iterations of the line search and trust region
2105 minimizer algorithms which call the line search algorithm as a
2106 subroutine.
2107
2108.. member:: double Solver::Summary::preprocessor_time_in_seconds
2109
2110 Time (in seconds) spent in the preprocessor.
2111
2112.. member:: double Solver::Summary::minimizer_time_in_seconds
2113
2114 Time (in seconds) spent in the Minimizer.
2115
2116.. member:: double Solver::Summary::postprocessor_time_in_seconds
2117
2118 Time (in seconds) spent in the post processor.
2119
2120.. member:: double Solver::Summary::total_time_in_seconds
2121
2122 Time (in seconds) spent in the solver.
2123
2124.. member:: double Solver::Summary::linear_solver_time_in_seconds
2125
2126 Time (in seconds) spent in the linear solver computing the trust
2127 region step.
2128
2129.. member:: int Solver::Summary::num_linear_solves
2130
2131 Number of times the Newton step was computed by solving a linear
2132 system. This does not include linear solves used by inner
2133 iterations.
2134
2135.. member:: double Solver::Summary::residual_evaluation_time_in_seconds
2136
2137 Time (in seconds) spent evaluating the residual vector.
2138
2139.. member:: int Solver::Summary::num_residual_evaluations
2140
2141 Number of times only the residuals were evaluated.
2142
2143.. member:: double Solver::Summary::jacobian_evaluation_time_in_seconds
2144
2145 Time (in seconds) spent evaluating the Jacobian matrix.
2146
2147.. member:: int Solver::Summary::num_jacobian_evaluations
2148
2149 Number of times only the Jacobian and the residuals were evaluated.
2150
2151.. member:: double Solver::Summary::inner_iteration_time_in_seconds
2152
2153 Time (in seconds) spent doing inner iterations.
2154
2155.. member:: int Solver::Summary::num_parameter_blocks
2156
2157 Number of parameter blocks in the problem.
2158
2159.. member:: int Solver::Summary::num_parameters
2160
2161 Number of parameters in the problem.
2162
2163.. member:: int Solver::Summary::num_effective_parameters
2164
2165 Dimension of the tangent space of the problem (or the number of
2166 columns in the Jacobian for the problem). This is different from
2167 :member:`Solver::Summary::num_parameters` if a parameter block is
2168 associated with a :class:`LocalParameterization`.
2169
2170.. member:: int Solver::Summary::num_residual_blocks
2171
2172 Number of residual blocks in the problem.
2173
2174.. member:: int Solver::Summary::num_residuals
2175
2176 Number of residuals in the problem.
2177
2178.. member:: int Solver::Summary::num_parameter_blocks_reduced
2179
2180 Number of parameter blocks in the problem after the inactive and
2181 constant parameter blocks have been removed. A parameter block is
2182 inactive if no residual block refers to it.
2183
2184.. member:: int Solver::Summary::num_parameters_reduced
2185
2186 Number of parameters in the reduced problem.
2187
2188.. member:: int Solver::Summary::num_effective_parameters_reduced
2189
2190 Dimension of the tangent space of the reduced problem (or the
2191 number of columns in the Jacobian for the reduced problem). This is
2192 different from :member:`Solver::Summary::num_parameters_reduced` if
2193 a parameter block in the reduced problem is associated with a
2194 :class:`LocalParameterization`.
2195
2196.. member:: int Solver::Summary::num_residual_blocks_reduced
2197
2198 Number of residual blocks in the reduced problem.
2199
2200.. member:: int Solver::Summary::num_residuals_reduced
2201
2202 Number of residuals in the reduced problem.
2203
2204.. member:: int Solver::Summary::num_threads_given
2205
2206 Number of threads specified by the user for Jacobian and residual
2207 evaluation.
2208
2209.. member:: int Solver::Summary::num_threads_used
2210
2211 Number of threads actually used by the solver for Jacobian and
2212 residual evaluation. This number is not equal to
2213 :member:`Solver::Summary::num_threads_given` if none of `OpenMP`
2214 or `CXX11_THREADS` is available.
2215
2216.. member:: LinearSolverType Solver::Summary::linear_solver_type_given
2217
2218 Type of the linear solver requested by the user.
2219
2220.. member:: LinearSolverType Solver::Summary::linear_solver_type_used
2221
2222 Type of the linear solver actually used. This may be different from
2223 :member:`Solver::Summary::linear_solver_type_given` if Ceres
2224 determines that the problem structure is not compatible with the
2225 linear solver requested or if the linear solver requested by the
2226 user is not available, e.g. The user requested
2227 `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
2228 available.
2229
2230.. member:: vector<int> Solver::Summary::linear_solver_ordering_given
2231
2232 Size of the elimination groups given by the user as hints to the
2233 linear solver.
2234
2235.. member:: vector<int> Solver::Summary::linear_solver_ordering_used
2236
2237 Size of the parameter groups used by the solver when ordering the
2238 columns of the Jacobian. This maybe different from
2239 :member:`Solver::Summary::linear_solver_ordering_given` if the user
2240 left :member:`Solver::Summary::linear_solver_ordering_given` blank
2241 and asked for an automatic ordering, or if the problem contains
2242 some constant or inactive parameter blocks.
2243
2244.. member:: std::string Solver::Summary::schur_structure_given
2245
2246 For Schur type linear solvers, this string describes the template
2247 specialization which was detected in the problem and should be
2248 used.
2249
2250.. member:: std::string Solver::Summary::schur_structure_used
2251
2252 For Schur type linear solvers, this string describes the template
2253 specialization that was actually instantiated and used. The reason
2254 this will be different from
2255 :member:`Solver::Summary::schur_structure_given` is because the
2256 corresponding template specialization does not exist.
2257
2258 Template specializations can be added to ceres by editing
2259 ``internal/ceres/generate_template_specializations.py``
2260
2261.. member:: bool Solver::Summary::inner_iterations_given
2262
2263 `True` if the user asked for inner iterations to be used as part of
2264 the optimization.
2265
2266.. member:: bool Solver::Summary::inner_iterations_used
2267
2268 `True` if the user asked for inner iterations to be used as part of
2269 the optimization and the problem structure was such that they were
2270 actually performed. For example, in a problem with just one parameter
2271 block, inner iterations are not performed.
2272
2273.. member:: vector<int> inner_iteration_ordering_given
2274
2275 Size of the parameter groups given by the user for performing inner
2276 iterations.
2277
2278.. member:: vector<int> inner_iteration_ordering_used
2279
2280 Size of the parameter groups given used by the solver for
2281 performing inner iterations. This maybe different from
2282 :member:`Solver::Summary::inner_iteration_ordering_given` if the
2283 user left :member:`Solver::Summary::inner_iteration_ordering_given`
2284 blank and asked for an automatic ordering, or if the problem
2285 contains some constant or inactive parameter blocks.
2286
2287.. member:: PreconditionerType Solver::Summary::preconditioner_type_given
2288
2289 Type of the preconditioner requested by the user.
2290
2291.. member:: PreconditionerType Solver::Summary::preconditioner_type_used
2292
2293 Type of the preconditioner actually used. This may be different
2294 from :member:`Solver::Summary::linear_solver_type_given` if Ceres
2295 determines that the problem structure is not compatible with the
2296 linear solver requested or if the linear solver requested by the
2297 user is not available.
2298
2299.. member:: VisibilityClusteringType Solver::Summary::visibility_clustering_type
2300
2301 Type of clustering algorithm used for visibility based
2302 preconditioning. Only meaningful when the
2303 :member:`Solver::Summary::preconditioner_type` is
2304 ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
2305
2306.. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type
2307
2308 Type of trust region strategy.
2309
2310.. member:: DoglegType Solver::Summary::dogleg_type
2311
2312 Type of dogleg strategy used for solving the trust region problem.
2313
2314.. member:: DenseLinearAlgebraLibraryType Solver::Summary::dense_linear_algebra_library_type
2315
2316 Type of the dense linear algebra library used.
2317
2318.. member:: SparseLinearAlgebraLibraryType Solver::Summary::sparse_linear_algebra_library_type
2319
2320 Type of the sparse linear algebra library used.
2321
2322.. member:: LineSearchDirectionType Solver::Summary::line_search_direction_type
2323
2324 Type of line search direction used.
2325
2326.. member:: LineSearchType Solver::Summary::line_search_type
2327
2328 Type of the line search algorithm used.
2329
2330.. member:: LineSearchInterpolationType Solver::Summary::line_search_interpolation_type
2331
2332 When performing line search, the degree of the polynomial used to
2333 approximate the objective function.
2334
2335.. member:: NonlinearConjugateGradientType Solver::Summary::nonlinear_conjugate_gradient_type
2336
2337 If the line search direction is `NONLINEAR_CONJUGATE_GRADIENT`,
2338 then this indicates the particular variant of non-linear conjugate
2339 gradient used.
2340
2341.. member:: int Solver::Summary::max_lbfgs_rank
2342
2343 If the type of the line search direction is `LBFGS`, then this
2344 indicates the rank of the Hessian approximation.