Squashed 'third_party/ceres/' changes from 399cda773..7f9cc571b

7f9cc571b Set CMAKE_CUDA_ARCHITECTURES depending on CUDAToolkit_VERSION
6a74af202 Remove a level of indirection when using CellInfo
f8c2994da Drop Ubuntu 20.04 and add Ubuntu 24.04 support
20954e693 Eliminate CUDA set but unused variable warning
57aba3ed0 Enable Apple linker library deduplication
1f15197be Drop macos-11 runner and support macos-14 instead
5de0fda0f Update Github actions
308a5bb43 Add missing include
715865101 Remove 32-bit MinGW from CI matrix
f71181a92 Remove remaining references to CXSparse
522210a08 Reuse macro to format version string
cd2dd06e9 Fix clang16 compiler warnings
1f2e6313a Add missing std qualifiers
125c06882 Link static cuda libs when ceres is build static
62c03d6ff Revert "Update Eigen::BDCSVD usage to comply with Eigen 3.4"
4027f6997 Update Eigen::BDCSVD usage to comply with Eigen 3.4
1b2ebabf5 Typo fixes in the documentation
2ffeb943a Fix typo in AutoDiffManifold comment and docs
da34da3da Remove CreateFakeBundleAdjustmentPartitionedJacobian
85b2c418a ClangTidy fixes
91773746b Simplify instantiation of cost functions and their functors
8b88a9ab4 Use C++17 Bessel functions
84436f791 Use native CMake TBB package configuration
556a56f21 Do not assume Homebrew usage
3fd2a72cc MinGW no longer provides a 32-bit METIS package
e776a6f1a Unbreak the Bazel build.
095d48392 Skip structure detection on preprocessor failure
a0876309a Fix typos in comments
774973731 Fix search on ceres-solver.org
85331393d Update docs for 2.2.0.
2120eae67 Optimize the computation of the LM diagonal in TinySolver
611b139b1 Fix Solver::Options::callbacks type in documentation
b652d3b4f Unbreak the windows build
b79c4d350 ClangTidy fixes
76af132d0 Update docs for 2.2.0RC3
dc7a85975 Single-threaded operations on small vectors
b379ab768 Remove MaxNumThreadsAvailable
354002f98 Schedule task in ParallerFor from the previous task
a9b3fcff4 Minor update to docs
5ccab18be Drop use of POSIX M_PI_2 and M_PI_4
4519b8d77 Drop use of POSIX M_PI
b83abdcb1 Add a default value for Solver::Summary::linear_solver_ordering_type
8d875a312 Fix checks for CUDA memory pools support
94335e3b9 More ClangTidy fixes
399395c4f Miscellaneous ClangTidy fixes
c8bed4b93 Update version_history for 2.2.0rc2
489339219 Rework MSVC warning suppression
0cea191d4 Move stream-ordered memory allocations
8e3b7d89e Fix a copy-pasta error
dc0bb8508 Various cleanups to the documentation
4588b0fbb Add an example for EvaluationCallback
0fc3fd4fc Add documentation for examples
e6b2f532b Parallelize PSE preconditioner
41672dff8 Add end-to-end BA tests for SCHUR_POWER_SERIES_EXPANSION
83ee376d8 Add an example for IterationCallback
371265094 Cleanup example code
59182a42c Update documentation
d4db6e6fe Fix typos in the documentation for EvaluationCallback
dffd8cd71 Add an accessor for the CostFunctor in DynamicAutoDiffCostFunction
bea247701 Add a missing include dir to the cuda kernels target.
18ea7d1c2 Runtime check for cudaMallocAsync support
a227045be Remove cuda-memcheck based tests
d10e786ca Remove an unused variable from CudaSparseMatrix
5a30cae58 Preparing for 2.2.0rc1
9cca67127 Enable compatibility with SuiteSparse 7.2.0
a1c02e8d3 Rework the Sphinx find module
a57e35bba Require at least CMake 3.16
863db948f Eliminate macOS sprintf warning
6f5342db6 Export ceres_cuda_kernels to project's build tree
d864d146f Add macOS 13 runner to Github workflows
01a23504d Add a workaround for CMake Policy CMP0148
de9cbde95 Work around MinGW32 manifold_test segfault
5e4b22f7f Update CudaSparseMatrix class
ed9921fc2 Fix Solver::Options in documentation
de62bf220 Two minor fixes
5f97455be Fix typos in documentation
5ba62abec Add CUDA support to windows CI builds
a98fdf582 Update CMakeLists.txt to fix Windows CUDA build
ec4907399 Fix block-sparse to crs conversion on windows
799ee91bb Fix check in CompressedRowJacobianWriter::CreateJacobian()
ee90f6cbf Detect large Jacobians and return failure instead of crashing.
310a252fb Deal with infinite initial cost correctly.
357482db7 Add NumTraits::max_digits10 for Jets
908b5b1b5 Fix type mismatch in documentation
75bacedf7 CUDA partitioned matrix view
fd6197ce0 Fixed GCC 13.1 compilation errors
df97a8f05 Improve support of older CUDA toolkit versions
085214ea7 Fix test CompressedRowSparseMatrix.Transpose
d880df09f Match new[] with delete[] in BSM
bdee4d617 Block-sparse to CRS conversion using block-structure
0f9de3daf Use page locked memory in BlockSparseMatrix
e7bd72d41 Permutation-based conversion from block-sparse to crs
abbc4e797 Explicitly compute number of non-zeros in row
96fdfd2e7 Implement tests for Euler conversion with jets
db1ebd3ff Work around lack of constexpr constructors for Jet
16a4fa04e Further Jet conversion fixes
92ad18b8a Fix a Jet conversion bug in rotation.h
a5e745d4e ClangTidy fixes
77ad8bb4e Change storage in BlockRandomAccessSparseMatrix
d340f81bd Clang Tidy fixes
54ad3dd03 Reorganize ParallelFor source files
ba360ab07 Change the value of BlockRandomAccessSparseMatrix::kMaxRowBlocks
0315c6ca9 Provide DynamicAutoDiffCostFunction deduction guide
3cdfae110 Replace Hertzberg mentions with citations
0af38a9fc Fix typos in documentation
4c969a6c1 Improve image of loss functions shape
b54f05b8e Add missing TukeyLoss to documentation
8bf4a2f42 Inexact check for ParallelAssign test
9cddce73a Explicit conversions from long to int in benchmarks (for num_threads)
5e787ab70 Using const ints in jacobian writers
f9bffbb6f Removing -Wshorten-64-to-32 warnings from examples (part 1)
e269b64f5 More ClangTidy fixes
f4eb768e0 Using int64 in file.cc. Fixing compilation error in array_utils.cc
74a0f0d24 Using int64_t for sizes and indexes in array utils
749a442d9 Clang-Tidy fixes
c4ba975ae Fix rotation_test.cc to work with older versions of Eigen
9602ed7b7 ClangFormat changes
79a554ffc Fix a bug in QuaternionRotatePoint.
f1113c08a Commenting unused parameters for better readibility
772d927e1 Replacing old style typedefs with new style usings
53df5ddcf Removing using std::...
e1ca3302a Increasing bazel timeout for heavy BA tests
f982d3071 Fixing bazel build
cb6b30662 Use hypot to compute the L^2 norm
1e2a24a8b Update Github actions to avoid deprecation warnings
285e5f9f4 Do not update brew formulae upon install
73d95b03f Clang-Tidy fixes
51d52c3ea Correct epsilon in CUDA QR test changed by last commit.
546f5337b Updates to CUDA dense linear algebra tests
19ab2c179 BlockRandomAccessMatrix Refactor
a3a062d72 Add a missing header
2b88bedb2 Remove unused variables
9beea728f Fix a bug in CoordinateDescentMinimizer
8e5d83f07 ClangFormat and ClangTidy changes
b15851508 Parallel operations on vectors
2fd81de12 Add build configuration with CUDA on Linux
06bfe6ffa Remove OpenMP and No threading backends.
352b320ab Fixed SuiteSparse 6.0 version parsing
8fd7828e3 ClangTidy fixes
0424615dc Fix PartitionedMatrixView usage in evaluation_benchmark
77a54dd3d Parallel updates to block-diagonal EtE FtF
946fa50de ClangTidy fixes
e4bef9505 Refactor PartitionedMatrixView to cache the partitions
d3201798e Clean up sparse_cholesky_test
addcd342f ClangTidy fixes
c2e7002d2 Remove an unused variable from evaluation_benchmark.cc
fef6d5875 Parallel left products for PartitionedMatrixView
37a3cb384 Update SuiteSparse in MSVC Github workflow
5d53d1ee3 Parallel for with iteration costs and left product
9aa52c6ff Use FindCUDAToolkit for CMake >= 3.17
47e03a6d8 Add const accessor for Problem::Options used by Problem
d6a931009 Clang Tidy Fixes
9364e31ee Fix a regression in SuiteSparse::AnalyzeCholesky
89b3e1f88 Remove unused includes of gflags and gtest
984079003 Fix missing regex dependency for gtest on QNX
6b296f27f Fix missing namespace qualification and docs for Manifold gtest macro
6685e629f AddBlockStructureTranspose to BlockSparseMatrix
699e3f3b3 Fix a link error in evaluation_benchmark.cc
19a3d07f9 Add multiplication benchmarks on BAL data
b221b1294 Format code with clang-format.
ccf32d70c Purge all remaining references to (defunct) LocalParameterization
5f8c406e2 struct ContextImpl -> class ContextImpl
9893c534c Several cleanups.
a78a57472 ClangTidy fixes
b1fe60330 Parallel right products for partitioned view
16668eedf Fix a memory leak in ContextImpl
d129938d5 Third time is the charm
4129b214a More fixes to cuda_dense_cholesky_test.cc
5c01d2573 Remove unused variables from cuda_dense_cholesky_test.cc
5e877ae69 Fix the Bazel build
d89290ffa Fix evalution_benchmark compilability
ae7f456e3 ClangTidy fixes
9438c370f Restore the semantics of TrustRegionMinimizer
c964fce90 Fix bug in cuda_kernels_test
afaad5678 Fix a typo
b7116824b Evaluation benchmark
2b89ce66f Add generalized Euler Angle conversions
8230edc6c ClangTidy fixes
9a2894763 Speed up locking when num_threads = 1.
739f2a25a Parallelize block_jacobi_preconditioner
c0c4f9394 Change implementation of parallel for
fc826c578 CUDA Cleanup
660af905f Fix a bug in TrustRegionMinimizer.
4cd257cf4 Let NumericDiffFirstOrderFunction take a dynamically sized parameter vector
6c27ac6d5 Fix repeated SpMV Benchmark
430a292ac Add a missing include
858b4b89b More ClangTidy fixes
e9e995740 ClangTidy fixes
7f8e930a0 Fix lint errors
f86a3bdbe Unify Block handling across matrix types
5f1946879 clang-formated source
00a05cf70 CUDA SDK Version-based SpMV Selection
de0f74e40 Optimize  the BlockCRSJacobiPreconditioner
ba65ddd31 Improvements to benchmarks
42352e2e2 Added CUDA Jacobi Preconditioner
f802a09ff &foo[0] -> foo.data()
344929647 Add CUDA GPU and Runtime Detection
6085e45be Minor CUDA cleanup.
e15ec89f3 Speed up bundle_adjuster
058a72782 One more ClangTidy fix.
f6f2f0d16 ClangTidy cleanups Also some clang-format cleanups.
9d74c6913 Fix some more errant CATD warnings
a4f744095 Remove an unused variable from compressed_row_sparse_matrix.cc
388c14286 Fix GCC 12.1.1 LTO -Walloc-size-larger-than= warnings
9ec4f7e44 Refactor BlockJacobiPreconditioner
adda97acd Fixed a few more missing CERES_NO_CUDA guards
22aeb3584 Fix a missing string assignment in solver.cc
3b891f767 Insert missing CUDA guards.
829089053 CUDA CGNR, Part 4: CudaCgnrSolver
6ab435d77 Fix a missing CERES_NO_CUDA guard
c560bc2be CUDA CGNR, Part 3: CudaSparseMatrix
c914c7a2b CUDA CGNR, Part 2: CudaVector
3af3dee18 Simplify the implementation to convert from BlockSparseMatrix to CompressedRowSparseMatrix.
242fc0795 Remove unnecessary destructors
737200ac8 Add macos-12 to Github workflow runners
9c968d406 Silence Clang warning
2c78c5f33 Small naming fixups.
2a25d86b0 Integrate schur power series expansion options to bundle adjuster
4642e4b0c Use if constexpr and map SparseMatrix as const
92d837953 Enable usage of schur power series expansion preconditioner.
f1dfac8cd Reduce the number of individual PRNG instances
79e403b15 Expand vcpkg installation instructions
7b0bb0e3f ClangTidy cleanups
c5c2afcc9 Fix solver_test.cc for preconditioners and sparse linear algebra libraries
07d333fb6 Refactor options checking for linear solvers
ba7207b0b A number of small changes.
20e85bbe3 Add power series expansion preconditioner
04899645c LinearOperator::FooMultiply -> LinearOperator::FooMultiplyAndAccumulate
288a3fde6 Add missing virtual destructors to matrix adapters
6483a2b4c Add Sergiu's name to the list of maintainers
1cf49d688 Update FindGlog.cmake to create glog::glog target
1da72ac39 Refactor ConjugateGradientsSolver
f62dccdb3 Fix the Sphere and Line Manifold formulations
3e1cc89f6 A bunch of clang-tidy fixes.
80380538a One more CATD fix
560ef46fb A bunch of minor fixes.
67bae28c1 CUDA CGNR, Part 1: Misc. CLeanup
5d0bca14d Remove ceres/internal/random.h in favor of <random>
d881b5ccf Minor fixes in comments
37516c968 Fix a bug in InnerProductComputer.
d8dad14ee CUDA Cleanup
c9d2ec8a9 Updates to sparse block matrix structures to support new sparse linear solvers.
5fe0bd45a Added MinGW to Windows Github workflow
738c027c1 Fix a logic error in iterative_refiner_test
cb6ad463d Add mixed precision support for CPU based DenseCholesky
df55682ba Fix Eigen error in 2D sphere manifolds Since Eigen does not allow to have a RowMajor column vector (see https://gitlab.com/libeigen/eigen/-/issues/416), the storage order must be set to ColMajor in that case. This fix adds that special case when generating 2D sphere manifolds.
1cf59f61e Set Github workflow NDK path explicitly
68c53bb39 Remove ceres::LocalParameterization
2f660464c Fix build issue with CUDA testing targets when compiling without gflags.
c801192d4 Minor fixes
ce9e902b8 Fix missing CERES_METIS_VERSION
d9a3dfbf2 Add a missing ifdef guard to dense_cholesky_test
5bd43a1fa Speed up DenseSparseMatrix::SquareColumnNorm.
cbc86f651 Fix the build when CUDA is not present
5af8e6449 Update year in solver.h
88e08cfe7 Mixed-precision Iterative Refinement Cholesky With CUDA
290b34ef0 Fix optional SuiteSparse + METIS test-suite names to be unique
d038e2d83 Fix use of NESDIS with SuiteSparse in tests if METIS is not found
027e741a1 Eliminated MinGW warning
4e5ea292b Fixed MSVC 2022 warning
83f6e0853 Fix use of conditional preprocessor checks within a macro in tests
70f1aac31 Fix fmin/fmax() when using Jets with float as their scalar type
5de77f399 Fix reporting of METIS version
11e637667 Fix #ifdef guards around METIS usage in EigenSparse backend and tests
0c88301e6 Provide optional METIS support
f11c25626 Fix fmin/fmax() to use Jet averaging on equality
b90053f1a Revert C++17 usage of std::exclusive_scan
dfce1e128 Link against threading library only if necessary
69eddfb6d Use find module to link against OpenMP
b4803778c Update documentation for linear_solver_ordering_type
2e764df06 Update Cuda memcheck test
443ae9ce2 Update Cuda memcheck test
55b4c3f44 Retain terminal formatting when building docs
786866d9f Generate version string at compile time
5bd83c4ac Unbreak the build with EIGENSPARSE is disabled
2335b5b4b Remove support for CXSparse
fbc2eea16 Nested dissection for ACCELERATE_SPARSE & EIGEN_SPARSE
d87fd551b Fix Ubuntu 20.04 workflow tests
71717f37c Use glog 0.6 release to run Windows Github workflow
66e0adfa7 Fix detection of sphinx-rtd-theme
d09f7e9d5 Enable postordering when computing the sparse factorization.
9b34ecef1 Unbreak the build on MacOS
8ba8fbb17 Remove Solver::Options::use_postordering
30b4d5df3 Fix the ceres.bzl to add missing cc files.
39ec5e8f9 Add Nested Dissection based fill reducing ordering
aa62dd86a Fix a build breakage
41c5fb1e8 Refactor suitesparse.h/cc
12263e283 Make the min. required version of SuiteSparse to be 4.5.6
c8493fc36 Convert internal enums to be class enums.
bb3a40c09 Add Nested Dissection ordering method to SuiteSparse
f1414cb5b Correct spelling in comments and docs.
fd2b0ceed Correct spelling (contiguous, BANS)
464abc198 Run Linux Github workflow on Ubuntu 22.04
caf614a6c Modernize code using c++17 constructs
be618133e Simplify some template metaprograms using fold expressions.
3b0096c1b Add the ability to specify the pivot threshold in Covariance::Options
40c1a7e18 Fix Github workflows
127474360 Ceres Solver now requires C++17
b5f1b7877 clang-format cleanup
32cd1115c Make the code in small_blas_generic.h more compiler friendly.
f68321e7d Update version history
b34280207 Fix MSVC small_blas_test failures
b246991b6 Update the citation instructions in the docs
c0c14abca Fix version history item numbering
d23dbac25 Update Windows install guide
e669c9fc7 Provide citation file
ff57c2e91 Update version history for 2.1.0rc2
ab9436cb9 Workaround MSVC STL deficiency in C++17 mode
97c232857 Update the included gtest to version 1.11.0
4eac7ddd2 Fix Jet lerp test regression
2ffbe126d Fix Jet test failures on ARMv8 with recent Xcode
0d6a0292c Fix unused arguments of Make1stOrderPerturbation
93511bfdc Fix SuiteSparse path and version reporting
bf329b30f Fix link to macOS badge
0133dada2 Add Github workflows
3d3d6ed71 Add missing includes
0a9c0df8a Fix path for cuda-memcheck tests
ee35ef66f ClangFormat cleanup via scripts/all_format.sh
470515985 Add missing includes for config.h
d3612c12c Set CMP0057 policy for IN_LIST operator in FindSuiteSparse.cmake
4bc100c13 Do not define unusable import targets
e91995cce Fix Ubuntu 18.04 shared library build
94af09186 Force C++ linker
a65e73885 Update installation docs
1a377d707 Fix Ubuntu 20.04 SPQR build
817f5a068 Switch to imported SuiteSparse, CXSparse, and METIS targets
b0f32a20d Hide remaining internal symbols
572395098 Add a missing include
b0aef211d Allow to store pointers in ProductManifold
9afe8cc45 Small compile fix to context_impl
284be88ca Allow ProductManifold default construction
f59059fff Bugfix to CUDA workspace handling
779634164 Fix MSVC linker error
7743d2e73 Store ProductManifold instances in a tuple
9f32c42ba Update Travis-CI status badge to .com from .org
eadfead69 Move LineManifold and SphereManifold into their own headers.
f0f8f93bb Fix docs inconsistencies
e40391efa Update version history in preparation for 2.1.0
6a37fbf9b Add static/compile time sizing to EuclideanManifold
4ad787ce1 Fix the bazel build
ae4d95df6 Two small clang-tidy fixes
f0851667b Fix MSVC compilation errors
c8658c899 Modernize more
46b3495a4 Standardize path handling using GNUInstallDirs
98bc3ca17 Fix shared library build due to missing compile features specification
8fe8ebc3a Add final specifier to public classes
84e1696f4 Add final specifier to internal classes.
518970f81 Context should be exported
09ec4997f Cleanup examples
90e58e10f Add missing #include.
15348abe9 Add CUDA based bundle adjustment tests.
57ec9dc92 Do not enforce a specific C++ standard
99698f053 Fix Apple Clang weak symbols warnings
8e0842162 Add support for dense CUDA solvers #3
aff51c907 Revert "Do not enforce a specific C++ standard"
527c3f7da Fixed gflags dependency
d839b7792 Do not enforce a specific C++ standard
f71167c62 Fixed missing include in minilog build
47502b833 Miscellaneous CUDA related changes.
bb2996681 Check CUDA is available in solver.cc
7d2e4152e Add support for dense CUDA solvers #2
f90833f5f Simplify symbol export
c6158e0ab Replace NULL by nullptr
7e4f5a51b Remove blas.h/cc as they are not used anymore.
e0fef6ef0 Add cmake option ENABLE_BITCODE for iOS builds
446487c54 Add <memory> header to all files using std::unique_ptr.
9c5f29d46 Use compiler attributes instead of [[deprecated]]
44039af2c Convert factory functions to return std::unique_ptrs.
708a2a723 Silence LocalParameterization deprecation warnings
de69e657a Fix another missing declaration warning
677711138 Fix segmentation fault in AVX2 builds
0141ca090 Deprecate LocalParameterizations
fdfa5184a Fix missing declaration warning
4742bf386 Add LineManifold.
c14f360e6 Drop trivial special members
ae65219e0 ClangTidy cleanups
a35bd1bf9 Use = default for trivial special members
db67e621e Fix dense_cholesky_test.cc comma handling.
484d3414e Replace virtual keyword by override
2092a720e Fix some nits.
36d6d8690 Add support for dense CUDA solvers #1
af5e48c71 Add SphereManifold.
182cb01c5 Normalize Jet classification and comparison
9dbd28989 Loosen tolerances in dense_qr_test.cc
cab853fd5 Add DenseQR Interface
8ae054ad9 Fix missing declaration warning in autodiff_manifold_test
408af7b1a Move the constructor and destructor for SchurComplementSolver
d51672d1c Move the constructor and destructor for DenseSchurComplementSolver
177b2f99d Use benchmark version 1.6 compatible syntax.
ce9669003 Add const accessor for functor wrapped by auto/numeric-diff objects
9367ec9cc Add a benchmark for dense linear solvers.
7d6524daf Support fma Jet
a0d81ad63 Fix a bug in AutoDiffManifold
5a99e42e1 ClangTidy fixes
40fb41355 Update .gitignore
e6e6ae087 Unbreak the bazel build
0572efc57 Fix a compilation warning in autodiff_manifold.h
475db73d0 Fix build breakage when LAPACK support is disabled.
095c9197f Fix iterative_refiner_test.cc
6d06e9b98 Add DenseCholesky
77c0c4d09 Migrate examples to use Manifolds
19eef54fc Rename Quaternion to QuaternionManifold. Also rename EigenQuaternion to EigenQuaternionManiold.
ca6d841c2 Add AutoDiffManifold
97d7e0737 Move the manifold testing matchers to manifold_test_utils.h
16436b34b Fix some more clang-tidy suggestions.
dcdefc216 Fix a bunch of clang-tidy suggestions.
d8a1b69ab Remove an unused variable from gradient_checker.cc
611b46b54 Remove the use of CHECK_NOTNULL.
125a0e9be LocalParameterization -> Manifold #1
00bfbae11 Add missing algorithm header to manifold.cc
1d5aff059 Refactor jet_tests.cc
fbd693091 Fix two unused variable warnings.
c0cb42e5f Add Problem::HasParameterization
7e2f9d9d4 Add EigenQuaternion manifold and tests for it.
b81a8bbb7 Add the Quaternion manifold and tests.
4a01dcb88 Add more invariants and documentation to manifold_test.cc
bdd80fcce Improve Manifold testing
ce1537030 Fixed missing headers in manifold.h
23b204d7e LocalParameterization -> Manifold #1
c2fab6502 Fix docs of supported sparse backends for mixed_precision_solves option
8cb441c49 Fix missing declaration warnings in GCC
d2b7f337c Remove split.cc from the bazel sources.
206061a6b Use standard c++ types in jet_test.cc
1f374a9fe Support promotion in comparison between Jet and scalars
06e68dbc5 Avoid midpoint overflow in the differential
276d24c73 Fix C++20 compilation
b1391e062 Support midpoint Jet
8426526df Support lerp Jet
57c279689 support 3-argument hypot jet
123fba61c Eigen::MappedSparseMatrix -> Eigen::Map<Eigen::SparseMatrix>
3f950c66d reworked copysign tests
48cb54d1b fix fmin and fmax NaN handling
552a4e517 support log10 jet
4e49c5422 reuse expm1 result for differential
8d3e64dd5 Use modern-style Eigen3 CMake variables
2fba61434 support log1p and expm1 jet
a668cabbc support norm jet
a3a4b6d77 support copysign jet
b75dac169 fix abs jet test comment
034bf566f use copysign for abs jet
31008453f Add example for BiCubicInterpolator
7ef4a1221 Add a section on implicit and inverse function theorems
686428f5c Move the further reading section to bibliography
e47d87fdd Add a note about Trigg's correction
d2852518d Fix the docs for Problem::RemoveResidualBlock & Problem::RemoveParameterBlock
06e02a173 Delete unused files split.h/cc
ac7268da5 Fix an 80cols issue in covariance_impl.cc
17dccef91 Add NumericDiffFirstOrderFunction
03d64141a Fix a incorrect check in reorder_program.cc
884111913 Two changes to TinySolver
4dff3ea2c Fix a number of typos in rotation.h
98719ced4 Fix a type in interfacing_with_autodiff.html
0299ce944 Update conf.py to be compatible with Sphinx 4.1.2
2a2b9bd6f Fix a bug in covariance_impl.cc
27fade7b8 Fix a bug in system_test.cc
d4eb83ee5 Fix the Jacobian in trust_region_minimizer_test.cc
5f6071a1c Fix a bug in local_parameterization_test.cc
b2e732b1e Fix errors in comments from William Gandler.
42f1d6717 Add accessors to GradientProblem
aefd37b18 Refactor small_blas_gemm_benchmark
dc20db303 [docs] Fix `IterationSummary` section. Add missing `IterationCallback`
90ba7d1ef [docs] Fix typos
c3129c3d4 Fix tests not executing
7de561e8e Fix dependency check for building documentation
4fbe218f2 Refactor small_blas_test
3fdde8ede Remove an errant double link.
20ad431f7 Fixing a typo in the version history
0c85c4092 Revert "Reduce copies involved in Jet operations"
3a02d5aaf Fix typo in LossFunctionWrapper sample code
7b2c223be Add fmax/fmin overloads for scalars
c036c7819 Reduce copies involved in Jet operations
51945e061 Introduce benchmark for Jet operations
ec4f2995b Do not check MaxNumThreadsAvailable if the thread number is set to 1.
98f639f54 Add a macro CERES_GET_FLAG.
766f2cab5 Reduce log spam in covariance_impl.cc.
941ea1347 Fix FindTBB version detection with TBB >= 2021.1.1
323c350a6 fix Eigen3_VERSION
2b32b3212 Revert "Group specializations into groups of four"
313caf1ae Allow Unity build.
4ba244cdb Group specializations into groups of four
d77a8100a Make miniglog's InitGoogleLogging argument const.
863724994 Use portable expression for constant 2/sqrt(pi)
97873ea65 Add some missing includes for glog/logging.h
d15b1bcd3 Increase tolerance in small_blas_test.cc
17cf01831 Hide 'format not a string literal' error in examples
64029909b Fix -Wno-maybe-uninitialized error
21294123d Fix nonnull arg compared to NULL error.
1dd417410 Fix -Wno-format-nonliteral
6c106bf51 Fix -Wmissing-field-initializers error
c48a32792 Use cc_binary includes so examples build as external repo
e0e14a5cd Fix errors found by -Werror
e84cf10e1 Fix an errant double in TinySolver.
66b4c33e8 updated unit quaternion rotation
d45ec47b5 Fix a typo in schur_eliminator.h

Change-Id: I60db062e44d051d50dbb3a145eec2f74d5190481
git-subtree-dir: third_party/ceres
git-subtree-split: 7f9cc571b03632f1df93ea35725a1f5dfffe2c72
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
diff --git a/docs/source/nnls_solving.rst b/docs/source/nnls_solving.rst
index 285df3a..184b94b 100644
--- a/docs/source/nnls_solving.rst
+++ b/docs/source/nnls_solving.rst
@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -12,10 +13,10 @@
 Introduction
 ============
 
-Effective use of Ceres requires some familiarity with the basic
+Effective use of Ceres Solver requires some familiarity with the basic
 components of a non-linear least squares solver, so before we describe
 how to configure and use the solver, we will take a brief look at how
-some of the core optimization algorithms in Ceres work.
+some of the core optimization algorithms in Ceres Solver work.
 
 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
 variables, and
@@ -27,15 +28,15 @@
           L \le x \le U
   :label: nonlinsq
 
-Where, :math:`L` and :math:`U` are lower and upper bounds on the
-parameter vector :math:`x`.
+Where, :math:`L` and :math:`U` are vector lower and upper bounds on
+the parameter vector :math:`x`. The inequality holds component-wise.
 
 Since the efficient global minimization of :eq:`nonlinsq` for
 general :math:`F(x)` is an intractable problem, we will have to settle
 for finding a local minimum.
 
 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
-:math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
+:math:`m\times n` matrix, where :math:`J_{ij}(x) = D_j f_i(x)`
 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
 = J(x)^\top F(x)`.
 
@@ -75,7 +76,7 @@
 Trust region methods are in some sense dual to line search methods:
 trust region methods first choose a step size (the size of the trust
 region) and then a step direction while line search methods first
-choose a step direction and then a step size. Ceres implements
+choose a step direction and then a step size. Ceres Solver implements
 multiple algorithms in both categories.
 
 .. _section-trust-region-methods:
@@ -152,8 +153,9 @@
 solving an unconstrained optimization of the form
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
+   :label: lsqr-naive
 
-Where, :math:`\lambda` is a Lagrange multiplier that is inverse
+Where, :math:`\lambda` is a Lagrange multiplier that is inversely
 related to :math:`\mu`. In Ceres, we solve for
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
@@ -162,39 +164,47 @@
 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
 
-Before going further, let us make some notational simplifications. We
-will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been concatenated
-at the bottom of the matrix :math:`J` and similarly a vector of zeros
-has been added to the bottom of the vector :math:`f` and the rest of
-our discussion will be in terms of :math:`J` and :math:`F`, i.e, the
-linear least squares problem.
+Before going further, let us make some notational simplifications.
+
+We will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been
+concatenated at the bottom of the matrix :math:`J(x)` and a
+corresponding vector of zeroes has been added to the bottom of
+:math:`F(x)`, i.e.:
+
+.. math:: J(x) = \begin{bmatrix} J(x) \\ \frac{1}{\sqrt{\mu}} D
+          \end{bmatrix},\quad F(x) = \begin{bmatrix} F(x) \\ 0
+          \end{bmatrix}.
+
+This allows us to re-write :eq:`lsqr` as
 
 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple
 
-For all but the smallest problems the solution of :eq:`simple` in
-each iteration of the Levenberg-Marquardt algorithm is the dominant
-computational cost in Ceres. Ceres provides a number of different
-options for solving :eq:`simple`. There are two major classes of
-methods - factorization and iterative.
+and only talk about :math:`J(x)` and :math:`F(x)` going forward.
+
+For all but the smallest problems the solution of :eq:`simple` in each
+iteration of the Levenberg-Marquardt algorithm is the dominant
+computational cost. Ceres provides a number of different options for
+solving :eq:`simple`. There are two major classes of methods -
+factorization and iterative.
 
 The factorization methods are based on computing an exact solution of
-:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
-step Levenberg-Marquardt algorithm. But it is not clear if an exact
-solution of :eq:`lsqr` is necessary at each step of the LM algorithm
-to solve :eq:`nonlinsq`. In fact, we have already seen evidence
-that this may not be the case, as :eq:`lsqr` is itself a regularized
-version of :eq:`linearapprox`. Indeed, it is possible to
-construct non-linear optimization algorithms in which the linearized
-problem is solved approximately. These algorithms are known as inexact
-Newton or truncated Newton methods [NocedalWright]_.
+:eq:`lsqr` using a Cholesky or a QR factorization and lead to the so
+called exact step Levenberg-Marquardt algorithm. But it is not clear
+if an exact solution of :eq:`lsqr` is necessary at each step of the
+Levenberg-Mardquardt algorithm.  We have already seen evidence that
+this may not be the case, as :eq:`lsqr` is itself a regularized
+version of :eq:`linearapprox`. Indeed, it is possible to construct
+non-linear optimization algorithms in which the linearized problem is
+solved approximately. These algorithms are known as inexact Newton or
+truncated Newton methods [NocedalWright]_.
 
 An inexact Newton method requires two ingredients. First, a cheap
 method for approximately solving systems of linear
 equations. Typically an iterative linear solver like the Conjugate
-Gradients method is used for this
-purpose [NocedalWright]_. Second, a termination rule for
-the iterative solver. A typical termination rule is of the form
+Gradients method is used for this purpose [NocedalWright]_. Second, a
+termination rule for the iterative solver. A typical termination rule
+is of the form
 
 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
    :label: inexact
@@ -212,14 +222,18 @@
 iterative linear solver, the inexact step Levenberg-Marquardt
 algorithm is used.
 
+We will talk more about the various linear solvers that you can use in
+:ref:`section-linear-solver`.
+
 .. _section-dogleg:
 
 Dogleg
 ------
 
 Another strategy for solving the trust region problem :eq:`trp` was
-introduced by M. J. D. Powell. The key idea there is to compute two
-vectors
+introduced by
+`M. J. D. Powell <https://en.wikipedia.org/wiki/Michael_J._D._Powell>`_. The
+key idea there is to compute two vectors
 
 .. math::
 
@@ -253,10 +267,14 @@
 Levenberg-Marquardt solves the linear approximation from scratch with
 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
 to compute the interpolation between the Gauss-Newton and the Cauchy
-vectors, as neither of them depend on the value of :math:`\mu`.
+vectors, as neither of them depend on the value of :math:`\mu`. As a
+result the Dogleg method only solves one linear system per successful
+step, while Levenberg-Marquardt may need to solve an arbitrary number
+of linear systems before it can make progress [LourakisArgyros]_.
 
-The Dogleg method can only be used with the exact factorization based
-linear solvers.
+A disadvantage of the Dogleg implementation in Ceres Solver is that is
+can only be used with method can only be used with exact factorization
+based linear solvers.
 
 .. _section-inner-iterations:
 
@@ -349,10 +367,10 @@
 enables the non-monotonic trust region algorithm as described by Conn,
 Gould & Toint in [Conn]_.
 
-Even though the value of the objective function may be larger
-than the minimum value encountered over the course of the
-optimization, the final parameters returned to the user are the
-ones corresponding to the minimum cost over all iterations.
+Even though the value of the objective function may be larger than the
+minimum value encountered over the course of the optimization, the
+final parameters returned to the user are the ones corresponding to
+the minimum cost over all iterations.
 
 The option to take non-monotonic steps is available for all trust
 region strategies.
@@ -363,11 +381,13 @@
 Line Search Methods
 ===================
 
-The line search method in Ceres Solver cannot handle bounds
-constraints right now, so it can only be used for solving
-unconstrained problems.
+.. NOTE::
 
-Line search algorithms
+   The line search method in Ceres Solver cannot handle bounds
+   constraints right now, so it can only be used for solving
+   unconstrained problems.
+
+The basic line search algorithm looks something like this:
 
    1. Given an initial point :math:`x`
    2. :math:`\Delta x = -H^{-1}(x) g(x)`
@@ -387,7 +407,7 @@
 direction :math:`\Delta x` and the method used for one dimensional
 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
 primary source of computational complexity in these
-methods. Currently, Ceres Solver supports three choices of search
+methods. Currently, Ceres Solver supports four choices of search
 directions, all aimed at large scale problems.
 
 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
@@ -405,8 +425,8 @@
 3. ``BFGS`` A generalization of the Secant method to multiple
    dimensions in which a full, dense approximation to the inverse
    Hessian is maintained and used to compute a quasi-Newton step
-   [NocedalWright]_.  BFGS is currently the best known general
-   quasi-Newton algorithm.
+   [NocedalWright]_.  ``BFGS`` and its limited memory variant ``LBFGS``
+   are currently the best known general quasi-Newton algorithm.
 
 4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
    method in which the last `M` iterations are used to approximate the
@@ -414,26 +434,31 @@
    [ByrdNocedal]_.
 
 Currently Ceres Solver supports both a backtracking and interpolation
-based Armijo line search algorithm, and a sectioning / zoom
-interpolation (strong) Wolfe condition line search algorithm.
-However, note that in order for the assumptions underlying the
-``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
-Wolfe line search algorithm should be used.
+based `Armijo line search algorithm
+<https://en.wikipedia.org/wiki/Backtracking_line_search>`_ (``ARMIJO``)
+, and a sectioning / zoom interpolation (strong) `Wolfe condition line
+search algorithm <https://en.wikipedia.org/wiki/Wolfe_conditions>`_
+(``WOLFE``).
+
+.. NOTE::
+
+   In order for the assumptions underlying the ``BFGS`` and ``LBFGS``
+   methods to be satisfied the ``WOLFE`` algorithm must be used.
 
 .. _section-linear-solver:
 
-LinearSolver
-============
+Linear Solvers
+==============
 
-Recall that in both of the trust-region methods described above, the
+Observe that for both of the trust-region methods described above, the
 key computational cost is the solution of a linear least squares
 problem of the form
 
-.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
+.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple2
 
 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
-f(x)`. For notational convenience let us also drop the dependence on
+F(x)`. For notational convenience let us also drop the dependence on
 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
 equivalent to solving the *normal equations*.
 
@@ -444,31 +469,65 @@
 
 .. _section-qr:
 
-``DENSE_QR``
-------------
+DENSE_QR
+--------
 
 For small problems (a couple of hundred parameters and a few thousand
-residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
-of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
-:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
-an upper triangular matrix [TrefethenBau]_. Then it can be shown that
-the solution to :eq:`normal` is given by
+residuals) with relatively dense Jacobians, QR-decomposition is the
+method of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition
+of :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R`
+is an upper triangular matrix [TrefethenBau]_. Then it can be shown
+that the solution to :eq:`normal` is given by
 
 .. math:: \Delta x^* = -R^{-1}Q^\top f
 
+You can use QR-decomposition by setting
+:member:`Solver::Options::linear_solver_type` to ``DENSE_QR``.
 
-Ceres uses ``Eigen`` 's dense QR factorization routines.
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen Householder QR factorization
+<https://eigen.tuxfamily.org/dox-devel/classEigen_1_1HouseholderQR.html>`_
+.
 
-.. _section-cholesky:
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DGEQRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Householder
+QR factorization.
 
-``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-------------------------------------------------------
+.. _DGEQRF: https://netlib.org/lapack/explore-html/df/dc5/group__variants_g_ecomputational_ga3766ea903391b5cf9008132f7440ec7b.html
 
-Large non-linear least square problems are usually sparse. In such
-cases, using a dense QR factorization is inefficient. Let :math:`H =
-R^\top R` be the Cholesky factorization of the normal equations, where
-:math:`R` is an upper triangular matrix, then the solution to
-:eq:`normal` is given by
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+QR-decomposition on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial
+speedup. Using CUDA only makes sense for moderate to large sized
+problems. This is because to perform the decomposition on the GPU the
+matrix :math:`J` needs to be transferred from the CPU to the GPU and
+this incurs a cost. So unless the speedup from doing the decomposition
+on the GPU is large enough to also account for the time taken to
+transfer the Jacobian to the GPU, using CUDA will not be better than
+just doing the decomposition on the CPU.
+
+.. _section-dense-normal-cholesky:
+
+DENSE_NORMAL_CHOLESKY
+---------------------
+
+It is often the case that the number of rows in the Jacobian :math:`J`
+are much larger than the the number of columns. The complexity of QR
+factorization scales linearly with the number of rows, so beyond a
+certain size it is more efficient to solve :eq:`normal` using a dense
+`Cholesky factorization
+<https://en.wikipedia.org/wiki/Cholesky_decomposition>`_.
+
+Let :math:`H = R^\top R` be the Cholesky factorization of the normal
+equations, where :math:`R` is an upper triangular matrix, then the
+solution to :eq:`normal` is given by
 
 .. math::
 
@@ -479,69 +538,155 @@
 factorization of :math:`H` is the same upper triangular matrix
 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
-Q^\top Q R = R^\top R`. There are two variants of Cholesky
-factorization -- sparse and dense.
+Q^\top Q R = R^\top R`.
 
-``DENSE_NORMAL_CHOLESKY``  as the name implies performs a dense
-Cholesky factorization of the normal equations. Ceres uses
-``Eigen`` 's dense LDLT factorization routines.
+Unfortunately, forming the matrix :math:`H = J'J` squares the
+condition number. As a result while the cost of forming :math:`H` and
+computing its Cholesky factorization is lower than computing the
+QR-factorization of :math:`J`, we pay the price in terms of increased
+numerical instability and potential failure of the Cholesky
+factorization for ill-conditioned Jacobians.
 
-``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
-Cholesky factorization of the normal equations. This leads to
-substantial savings in time and memory for large sparse
-problems. Ceres uses the sparse Cholesky factorization routines in
-Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
-or the sparse Cholesky factorization algorithm in ``Eigen`` (which
-incidently is a port of the algorithm implemented inside ``CXSparse``)
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``DENSE_NORMAL_CHOLESKY``.
+
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen's LLT factorization`_ routine.
+
+.. _Eigen's LLT Factorization:  https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html
+
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DPOTRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Cholesky
+factorization.
+
+.. _DPOTRF: https://www.netlib.org/lapack/explore-html/d1/d7a/group__double_p_ocomputational_ga2f55f604a6003d03b5cd4a0adcfb74d6.html
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+Cholesky factorization on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial speedup.
+Using CUDA only makes sense for moderate to large sized problems. This
+is because to perform the decomposition on the GPU the matrix
+:math:`H` needs to be transferred from the CPU to the GPU and this
+incurs a cost. So unless the speedup from doing the decomposition on
+the GPU is large enough to also account for the time taken to transfer
+the Jacobian to the GPU, using CUDA will not be better than just doing
+the decomposition on the CPU.
+
+
+.. _section-sparse-normal-cholesky:
+
+SPARSE_NORMAL_CHOLESKY
+----------------------
+
+Large non-linear least square problems are usually sparse. In such
+cases, using a dense QR or Cholesky factorization is inefficient. For
+such problems, Cholesky factorization routines which treat :math:`H`
+as a sparse matrix and computes a sparse factor :math:`R` are better
+suited [Davis]_. This can lead to substantial savings in memory and
+CPU time for large sparse problems.
+
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``SPARSE_NORMAL_CHOLESKY``.
+
+The use of this linear solver requires that Ceres is compiled with
+support for at least one of:
+
+ 1. `SuiteSparse <https://people.engr.tamu.edu/davis/suitesparse.html>`_ (``SUITE_SPARSE``).
+ 2. `Apple's Accelerate framework
+    <https://developer.apple.com/documentation/accelerate/sparse_solvers?language=objc>`_
+    (``ACCELERATE_SPARSE``).
+ 3. `Eigen's sparse linear solvers
+    <https://eigen.tuxfamily.org/dox/group__SparseCholesky__Module.html>`_
+    (``EIGEN_SPARSE``).
+
+SuiteSparse and Accelerate offer high performance sparse Cholesky
+factorization routines as they level-3 BLAS routines
+internally. Eigen's sparse Cholesky routines are *simplicial* and do
+not use dense linear algebra routines and as a result cannot compete
+with SuiteSparse and Accelerate, especially on large problems. As a
+result to get the best performance out of SuiteSparse it should be
+linked to high quality BLAS and LAPACK implementations e.g. `ATLAS
+<https://math-atlas.sourceforge.net/>`_, `OpenBLAS
+<https://www.openblas.net/>`_ or `Intel MKL
+<https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html>`_.
+
+A critical part of a sparse Cholesky factorization routine is the use
+a fill-reducing ordering. By default Ceres Solver uses the Approximate
+Minimum Degree (``AMD``) ordering, which usually performs well, but
+there are other options that may perform better depending on the
+actual sparsity structure of the Jacobian. See :ref:`section-ordering`
+for more details.
 
 .. _section-cgnr:
 
-``CGNR``
---------
+CGNR
+----
 
-For general sparse problems, if the problem is too large for
-``CHOLMOD`` or a sparse linear algebra library is not linked into
-Ceres, another option is the ``CGNR`` solver. This solver uses the
-Conjugate Gradients solver on the *normal equations*, but without
-forming the normal equations explicitly. It exploits the relation
+For general sparse problems, if the problem is too large for sparse
+Cholesky factorization or a sparse linear algebra library is not
+linked into Ceres, another option is the ``CGNR`` solver. This solver
+uses the `Conjugate Gradients
+<https://en.wikipedia.org/wiki/Conjugate_gradient_method>_` method on
+the *normal equations*, but without forming the normal equations
+explicitly. It exploits the relation
 
 .. math::
     H x = J^\top J x = J^\top(J x)
 
-The convergence of Conjugate Gradients depends on the conditioner
-number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
-a :ref:`section-preconditioner` must be used to get reasonable
-performance. Currently only the ``JACOBI`` preconditioner is available
-for use with ``CGNR``. It uses the block diagonal of :math:`H` to
-precondition the normal equations.
+Because ``CGNR`` never solves the linear system exactly, when the user
+chooses ``CGNR`` as the linear solver, Ceres automatically switches
+from the exact step algorithm to an inexact step algorithm. This also
+means that ``CGNR`` can only be used with ``LEVENBERG_MARQUARDT`` and
+not with ``DOGLEG`` trust region strategy.
 
-When the user chooses ``CGNR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+``CGNR`` by default runs on the CPU. However, if an NVIDIA GPU is
+available and Ceres Solver has been built with CUDA support enabled,
+then the user can also choose to run ``CGNR`` on the GPU by setting
+:member:`Solver::Options::sparse_linear_algebra_library_type` to
+``CUDA_SPARSE``. The key complexity of ``CGNR`` comes from evaluating
+the two sparse-matrix vector products (SpMV) :math:`Jx` and
+:math:`J'y`. GPUs are particularly well suited for doing sparse
+matrix-vector products. As a result, for large problems using a GPU
+can lead to a substantial speedup.
+
+The convergence of Conjugate Gradients depends on the conditioner
+number :math:`\kappa(H)`. Usually :math:`H` is quite poorly
+conditioned and a `Preconditioner
+<https://en.wikipedia.org/wiki/Preconditioner>`_ must be used to get
+reasonable performance. See section on :ref:`section-preconditioner`
+for more details.
 
 .. _section-schur:
 
-``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-----------------------------------
+DENSE_SCHUR & SPARSE_SCHUR
+--------------------------
 
 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
-adjustment problems, bundle adjustment problem have a special
-structure, and a more efficient scheme for solving :eq:`normal`
-can be constructed.
+adjustment problems, they have a special sparsity structure that can
+be exploited to solve the normal equations more efficiently.
 
-Suppose that the SfM problem consists of :math:`p` cameras and
-:math:`q` points and the variable vector :math:`x` has the block
-structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
-:math:`y` and :math:`z` correspond to camera and point parameters,
-respectively.  Further, let the camera blocks be of size :math:`c` and
-the point blocks be of size :math:`s` (for most problems :math:`c` =
-:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
-requirement on these block sizes, but choosing them to be constant
-simplifies the exposition.
+Suppose that the bundle adjustment problem consists of :math:`p`
+cameras and :math:`q` points and the variable vector :math:`x` has the
+block structure :math:`x = [y_{1}, ... ,y_{p},z_{1},
+... ,z_{q}]`. Where, :math:`y` and :math:`z` correspond to camera and
+point parameters respectively.  Further, let the camera blocks be of
+size :math:`c` and the point blocks be of size :math:`s` (for most
+problems :math:`c` = :math:`6`--`9` and :math:`s = 3`). Ceres does not
+impose any constancy requirement on these block sizes, but choosing
+them to be constant simplifies the exposition.
 
-A key characteristic of the bundle adjustment problem is that there is
-no term :math:`f_{i}` that includes two or more point blocks.  This in
-turn implies that the matrix :math:`H` is of the form
+The key property of bundle adjustment problems which we will exploit
+is the fact that no term :math:`f_{i}` in :eq:`nonlinsq` includes two
+or more point blocks at the same time.  This in turn implies that the
+matrix :math:`H` is of the form
 
 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
    :label: hblock
@@ -585,123 +730,208 @@
 observe at least one common point.
 
 
-Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
-:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
+Now :eq:`linear2` can be solved by first forming :math:`S`, solving
+for :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
 and matrix-vector multiplies, and the solution of block sparse
 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
 problems, the number of cameras is much smaller than the number of
-points, :math:`p \ll q`, thus solving :eq:`schur` is
-significantly cheaper than solving :eq:`linear2`. This is the
-*Schur complement trick* [Brown]_.
+points, :math:`p \ll q`, thus solving :eq:`schur` is significantly
+cheaper than solving :eq:`linear2`. This is the *Schur complement
+trick* [Brown]_.
 
-This still leaves open the question of solving :eq:`schur`. The
-method of choice for solving symmetric positive definite systems
-exactly is via the Cholesky factorization [TrefethenBau]_ and
-depending upon the structure of the matrix, there are, in general, two
-options. The first is direct factorization, where we store and factor
-:math:`S` as a dense matrix [TrefethenBau]_. This method has
+This still leaves open the question of solving :eq:`schur`. As we
+discussed when considering the exact solution of the normal equations
+using Cholesky factorization, we have two options.
+
+1. ``DENSE_SCHUR`` - The first is **dense Cholesky factorization**,
+where we store and factor :math:`S` as a dense matrix. This method has
 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
-is only practical for problems with up to a few hundred cameras. Ceres
-implements this strategy as the ``DENSE_SCHUR`` solver.
+is only practical for problems with up to a few hundred cameras.
 
-
-But, :math:`S` is typically a fairly sparse matrix, as most images
-only see a small fraction of the scene. This leads us to the second
-option: Sparse Direct Methods. These methods store :math:`S` as a
+2. ``SPARSE_SCHUR`` - For large bundle adjustment problems :math:`S`
+is typically a fairly sparse matrix, as most images only see a small
+fraction of the scene. This leads us to the second option: **sparse
+Cholesky factorization** [Davis]_.  Here we store :math:`S` as a
 sparse matrix, use row and column re-ordering algorithms to maximize
 the sparsity of the Cholesky decomposition, and focus their compute
-effort on the non-zero part of the factorization [Chen]_. Sparse
-direct methods, depending on the exact sparsity structure of the Schur
-complement, allow bundle adjustment algorithms to significantly scale
-up over those based on dense factorization. Ceres implements this
-strategy as the ``SPARSE_SCHUR`` solver.
+effort on the non-zero part of the factorization [Davis]_ [Chen]_
+. Sparse direct methods, depending on the exact sparsity structure of
+the Schur complement, allow bundle adjustment algorithms to scenes
+with thousands of cameras.
+
 
 .. _section-iterative_schur:
 
-``ITERATIVE_SCHUR``
--------------------
+ITERATIVE_SCHUR
+---------------
 
-Another option for bundle adjustment problems is to apply
-Preconditioned Conjugate Gradients to the reduced camera matrix
-:math:`S` instead of :math:`H`. One reason to do this is that
-:math:`S` is a much smaller matrix than :math:`H`, but more
-importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
+Another option for bundle adjustment problems is to apply Conjugate
+Gradients to the reduced camera matrix :math:`S` instead of
+:math:`H`. One reason to do this is that :math:`S` is a much smaller
+matrix than :math:`H`, but more importantly, it can be shown that
+:math:`\kappa(S)\leq \kappa(H)` [Agarwal]_.
+
 Ceres implements Conjugate Gradients on :math:`S` as the
 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
 as the linear solver, Ceres automatically switches from the exact step
 algorithm to an inexact step algorithm.
 
+
 The key computational operation when using Conjuagate Gradients is the
 evaluation of the matrix vector product :math:`Sx` for an arbitrary
-vector :math:`x`. There are two ways in which this product can be
-evaluated, and this can be controlled using
-``Solver::Options::use_explicit_schur_complement``. Depending on the
-problem at hand, the performance difference between these two methods
-can be quite substantial.
+vector :math:`x`. Because PCG only needs access to :math:`S` via its
+product with a vector, one way to evaluate :math:`Sx` is to observe
+that
 
-  1. **Implicit** This is default. Implicit evaluation is suitable for
-     large problems where the cost of computing and storing the Schur
-     Complement :math:`S` is prohibitive. Because PCG only needs
-     access to :math:`S` via its product with a vector, one way to
-     evaluate :math:`Sx` is to observe that
+.. math::  x_1 &= E^\top x\\
+           x_2 &= C^{-1} x_1\\
+           x_3 &= Ex_2\\
+           x_4 &= Bx\\
+           Sx &= x_4 - x_3
+   :label: schurtrick1
 
-     .. math::  x_1 &= E^\top x\\
-                x_2 &= C^{-1} x_1\\
-                x_3 &= Ex_2\\
-                x_4 &= Bx\\
-                Sx &= x_4 - x_3
-        :label: schurtrick1
+Thus, we can run Conjugate Gradients on :math:`S` with the same
+computational effort per iteration as Conjugate Gradients on
+:math:`H`, while reaping the benefits of a more powerful
+preconditioner. In fact, we do not even need to compute :math:`H`,
+:eq:`schurtrick1` can be implemented using just the columns of
+:math:`J`.
 
-     Thus, we can run PCG on :math:`S` with the same computational
-     effort per iteration as PCG on :math:`H`, while reaping the
-     benefits of a more powerful preconditioner. In fact, we do not
-     even need to compute :math:`H`, :eq:`schurtrick1` can be
-     implemented using just the columns of :math:`J`.
+Equation :eq:`schurtrick1` is closely related to *Domain Decomposition
+methods* for solving large linear systems that arise in structural
+engineering and partial differential equations. In the language of
+Domain Decomposition, each point in a bundle adjustment problem is a
+domain, and the cameras form the interface between these domains. The
+iterative solution of the Schur complement then falls within the
+sub-category of techniques known as Iterative Sub-structuring [Saad]_
+[Mathew]_.
 
-     Equation :eq:`schurtrick1` is closely related to *Domain
-     Decomposition methods* for solving large linear systems that
-     arise in structural engineering and partial differential
-     equations. In the language of Domain Decomposition, each point in
-     a bundle adjustment problem is a domain, and the cameras form the
-     interface between these domains. The iterative solution of the
-     Schur complement then falls within the sub-category of techniques
-     known as Iterative Sub-structuring [Saad]_ [Mathew]_.
+While in most cases the above method for evaluating :math:`Sx` is the
+way to go, for some problems it is better to compute the Schur
+complemenent :math:`S` explicitly and then run Conjugate Gradients on
+it. This can be done by settin
+``Solver::Options::use_explicit_schur_complement`` to ``true``. This
+option can only be used with the ``SCHUR_JACOBI`` preconditioner.
 
-  2. **Explicit** The complexity of implicit matrix-vector product
-     evaluation scales with the number of non-zeros in the
-     Jacobian. For small to medium sized problems, the cost of
-     constructing the Schur Complement is small enough that it is
-     better to construct it explicitly in memory and use it to
-     evaluate the product :math:`Sx`.
 
-When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+.. _section-schur_power_series_expansion:
 
-  .. NOTE::
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
 
-     In exact arithmetic, the choice of implicit versus explicit Schur
-     complement would have no impact on solution quality. However, in
-     practice if the Jacobian is poorly conditioned, one may observe
-     (usually small) differences in solution quality. This is a
-     natural consequence of performing computations in finite arithmetic.
+It can be shown that the inverse of the Schur complement can be
+written as an infinite power-series [Weber]_ [Zheng]_:
 
+.. math:: S      &= B - EC^{-1}E^\top\\
+	         &= B(I - B^{-1}EC^{-1}E^\top)\\
+	  S^{-1} &= (I - B^{-1}EC^{-1}E^\top)^{-1} B^{-1}\\
+	         & = \sum_{i=0}^\infty \left(B^{-1}EC^{-1}E^\top\right)^{i} B^{-1}
+
+As a result a truncated version of this power series expansion can be
+used to approximate the inverse and therefore the solution to
+:eq:`schur`. Ceres allows the user to use Schur power series expansion
+in three ways.
+
+1. As a linear solver. This is what [Weber]_ calls **Power Bundle
+   Adjustment** and corresponds to using the truncated power series to
+   approximate the inverse of the Schur complement. This is done by
+   setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = IDENTITY
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      Solver::Options::spse_tolerance = 0.1;
+
+
+2. As a preconditioner for ``ITERATIVE_SCHUR``. Any method for
+   approximating the inverse of a matrix can also be used as a
+   preconditioner. This is enabled by setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = SCHUR_POWER_SERIES_EXPANSION;
+      Solver::Options::use_spse_initialization = false;
+
+      // This is worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+
+
+3. As initialization for ``ITERATIIVE_SCHUR`` with any
+   preconditioner. This is a combination of the above two, where the
+   Schur Power Series Expansion
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = ... // Preconditioner of your choice.
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      // This only affects the initialization but not the preconditioner.
+      Solver::Options::spse_tolerance = 0.1;
+
+
+.. _section-mixed-precision:
+
+Mixed Precision Solves
+======================
+
+Generally speaking Ceres Solver does all its arithmetic in double
+precision. Sometimes though, one can use single precision arithmetic
+to get substantial speedups. Currently, for linear solvers that
+perform Cholesky factorization (sparse or dense) the user has the
+option cast the linear system to single precision and then use
+single precision Cholesky factorization routines to solve the
+resulting linear system. This can be enabled by setting
+:member:`Solver::Options::use_mixed_precision_solves` to ``true``.
+
+Depending on the conditioning of the problem, the use of single
+precision factorization may lead to some loss of accuracy. Some of
+this accuracy can be recovered by performing `Iterative Refinement
+<https://en.wikipedia.org/wiki/Iterative_refinement>`_. The number of
+iterations of iterative refinement are controlled by
+:member:`Solver::Options::max_num_refinement_iterations`. The default
+value of this parameter is zero, which means if
+:member:`Solver::Options::use_mixed_precision_solves` is ``true``,
+then no iterative refinement is performed. Usually 2-3 refinement
+iterations are enough.
+
+Mixed precision solves are available in the following linear solver
+configurations:
+
+1. ``DENSE_NORMAL_CHOLESKY`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+2. ``DENSE_SCHUR`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+3. ``SPARSE_NORMAL_CHOLESKY`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+4. ``SPARSE_SCHUR`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+
+Mixed precision solves area not available when using ``SUITE_SPARSE``
+as the sparse linear algebra backend because SuiteSparse/CHOLMOD does
+not support single precision solves.
 
 .. _section-preconditioner:
 
-Preconditioner
-==============
+Preconditioners
+===============
 
-The convergence rate of Conjugate Gradients for
-solving :eq:`normal` depends on the distribution of eigenvalues
-of :math:`H` [Saad]_. A useful upper bound is
-:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
-number of the matrix :math:`H`. For most bundle adjustment problems,
-:math:`\kappa(H)` is high and a direct application of Conjugate
-Gradients to :eq:`normal` results in extremely poor performance.
+The convergence rate of Conjugate Gradients for solving :eq:`normal`
+depends on the distribution of eigenvalues of :math:`H` [Saad]_. A
+useful upper bound is :math:`\sqrt{\kappa(H)}`, where,
+:math:`\kappa(H)` is the condition number of the matrix :math:`H`. For
+most non-linear least squares problems, :math:`\kappa(H)` is high and
+a direct application of Conjugate Gradients to :eq:`normal` results in
+extremely poor performance.
 
 The solution to this problem is to replace :eq:`normal` with a
 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
@@ -733,8 +963,15 @@
 problems with general sparsity as well as the special sparsity
 structure encountered in bundle adjustment problems.
 
-``JACOBI``
-----------
+IDENTITY
+--------
+
+This is equivalent to using an identity matrix as a preconditioner,
+i.e. no preconditioner at all.
+
+
+JACOBI
+------
 
 The simplest of all preconditioners is the diagonal or Jacobi
 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
@@ -742,24 +979,27 @@
 block Jacobi preconditioner. The ``JACOBI`` preconditioner in Ceres
 when used with :ref:`section-cgnr` refers to the block diagonal of
 :math:`H` and when used with :ref:`section-iterative_schur` refers to
-the block diagonal of :math:`B` [Mandel]_. For detailed performance
-data about the performance of ``JACOBI`` on bundle adjustment problems
-see [Agarwal]_.
+the block diagonal of :math:`B` [Mandel]_.
+
+For detailed performance data about the performance of ``JACOBI`` on
+bundle adjustment problems see [Agarwal]_.
 
 
-``SCHUR_JACOBI``
-----------------
+SCHUR_JACOBI
+------------
 
 Another obvious choice for :ref:`section-iterative_schur` is the block
 diagonal of the Schur complement matrix :math:`S`, i.e, the block
 Jacobi preconditioner for :math:`S`. In Ceres we refer to it as the
-``SCHUR_JACOBI`` preconditioner.  For detailed performance data about
-the performance of ``SCHUR_JACOBI`` on bundle adjustment problems see
-[Agarwal]_.
+``SCHUR_JACOBI`` preconditioner.
 
 
-``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``
-----------------------------------------------
+For detailed performance data about the performance of
+``SCHUR_JACOBI`` on bundle adjustment problems see [Agarwal]_.
+
+
+CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL
+--------------------------------------
 
 For bundle adjustment problems arising in reconstruction from
 community photo collections, more effective preconditioners can be
@@ -790,8 +1030,19 @@
 algorithm. The choice of clustering algorithm is controlled by
 :member:`Solver::Options::visibility_clustering_type`.
 
-``SUBSET``
-----------
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
+
+As explained in :ref:`section-schur_power_series_expansion`, the Schur
+complement matrix admits a power series expansion and a truncated
+version of this power series can be used as a preconditioner for
+``ITERATIVE_SCHUR``. When used as a preconditioner
+:member:`Solver::Options::max_num_spse_iterations` controls the number
+of terms in the power series that are used.
+
+
+SUBSET
+------
 
 This is a  preconditioner for problems with general  sparsity. Given a
 subset  of residual  blocks of  a problem,  it uses  the corresponding
@@ -811,6 +1062,8 @@
 :math:`Q` approximates :math:`J^\top J`, or how well the chosen
 residual blocks approximate the full problem.
 
+This preconditioner is NOT available when running ``CGNR`` using
+``CUDA``.
 
 .. _section-ordering:
 
@@ -821,33 +1074,36 @@
 have a significant of impact on the efficiency and accuracy of the
 method. For example when doing sparse Cholesky factorization, there
 are matrices for which a good ordering will give a Cholesky factor
-with :math:`O(n)` storage, where as a bad ordering will result in an
+with :math:`O(n)` storage, whereas a bad ordering will result in an
 completely dense factor.
 
 Ceres allows the user to provide varying amounts of hints to the
 solver about the variable elimination ordering to use. This can range
-from no hints, where the solver is free to decide the best ordering
-based on the user's choices like the linear solver being used, to an
-exact order in which the variables should be eliminated, and a variety
-of possibilities in between.
+from no hints, where the solver is free to decide the best possible
+ordering based on the user's choices like the linear solver being
+used, to an exact order in which the variables should be eliminated,
+and a variety of possibilities in between.
 
-Instances of the :class:`ParameterBlockOrdering` class are used to
-communicate this information to Ceres.
+The simplest thing to do is to just set
+:member:`Solver::Options::linear_solver_ordering_type` to ``AMD``
+(default) or ``NESDIS`` based on your understanding of the problem or
+empirical testing.
 
-Formally an ordering is an ordered partitioning of the parameter
-blocks. Each parameter block belongs to exactly one group, and each
-group has a unique integer associated with it, that determines its
-order in the set of groups. We call these groups *Elimination Groups*
 
-Given such an ordering, Ceres ensures that the parameter blocks in the
-lowest numbered elimination group are eliminated first, and then the
-parameter blocks in the next lowest numbered elimination group and so
-on. Within each elimination group, Ceres is free to order the
-parameter blocks as it chooses. For example, consider the linear system
+More information can be commmuniucated by using an instance
+:class:`ParameterBlockOrdering` class.
+
+Formally an ordering is an ordered partitioning of the
+parameter blocks, i.e, each parameter block belongs to exactly
+one group, and each group has a unique non-negative integer
+associated with it, that determines its order in the set of
+groups.
+
+e.g. Consider the linear system
 
 .. math::
-  x + y &= 3\\
-  2x + 3y &= 7
+   x + y &= 3 \\
+   2x + 3y &= 7
 
 There are two ways in which it can be solved. First eliminating
 :math:`x` from the two equations, solving for :math:`y` and then back
@@ -855,33 +1111,92 @@
 for :math:`x` and back substituting for :math:`y`. The user can
 construct three orderings here.
 
-1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
-2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
-3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
+1. :math:`\{0: x\}, \{1: y\}` - eliminate :math:`x` first.
+2. :math:`\{0: y\}, \{1: x\}` - eliminate :math:`y` first.
+3. :math:`\{0: x, y\}` - Solver gets to decide the elimination order.
 
-Thus, to have Ceres determine the ordering automatically using
-heuristics, put all the variables in the same elimination group. The
-identity of the group does not matter. This is the same as not
-specifying an ordering at all. To control the ordering for every
-variable, create an elimination group per variable, ordering them in
-the desired order.
+Thus, to have Ceres determine the ordering automatically, put all the
+variables in group 0 and to control the ordering for every variable,
+create groups :math:`0 \dots N-1`, one per variable, in the desired
+order.
+
+``linear_solver_ordering == nullptr`` and an ordering where all the
+parameter blocks are in one elimination group mean the same thing -
+the solver is free to choose what it thinks is the best elimination
+ordering using the ordering algorithm (specified using
+:member:`Solver::Options::linear_solver_ordering_type`). Therefore in
+the following we will only consider the case where
+``linear_solver_ordering != nullptr``.
+
+The exact interpretation of the ``linear_solver_ordering`` depends on
+the values of :member:`Solver::Options::linear_solver_ordering_type`,
+:member:`Solver::Options::linear_solver_type`,
+:member:`Solver::Options::preconditioner_type` and
+:member:`Solver::Options::sparse_linear_algebra_library_type` as we will
+explain below.
+
+Bundle Adjustment
+-----------------
 
 If the user is using one of the Schur solvers (``DENSE_SCHUR``,
 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
 ordering, it must have one important property. The lowest numbered
 elimination group must form an independent set in the graph
 corresponding to the Hessian, or in other words, no two parameter
-blocks in in the first elimination group should co-occur in the same
+blocks in the first elimination group should co-occur in the same
 residual block. For the best performance, this elimination group
 should be as large as possible. For standard bundle adjustment
 problems, this corresponds to the first elimination group containing
-all the 3d points, and the second containing the all the cameras
-parameter blocks.
+all the 3d points, and the second containing the parameter blocks for
+all the cameras.
 
 If the user leaves the choice to Ceres, then the solver uses an
 approximate maximum independent set algorithm to identify the first
 elimination group [LiSaad]_.
 
+``sparse_linear_algebra_library_type = SUITE_SPARSE``
+-----------------------------------------------------
+
+**linear_solver_ordering_type = AMD**
+
+A constrained Approximate Minimum Degree (CAMD) ordering is used where
+the parameter blocks in the lowest numbered group are eliminated
+first, and then the parameter blocks in the next lowest numbered group
+and so on. Within each group, CAMD is free to order the parameter blocks
+as it chooses.
+
+**linear_solver_ordering_type = NESDIS**
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and a Nested
+   Dissection algorithm is used to compute a fill reducing ordering.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   Nested Dissection is used to compute a fill reducing ordering for
+   the Schur Complement (or its preconditioner).
+
+``sparse_linear_algebra_library_type = EIGEN_SPARSE/ACCELERATE_SPARSE``
+-----------------------------------------------------------------------
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and ``AMD`` or
+   ``NESDIS`` is used to compute a fill reducing ordering as requested
+   by the user.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   ``AMD`` or ``NESID`` is used to compute a fill reducing ordering
+   for the Schur Complement (or its preconditioner) as requested by
+   the user.
+
+
 .. _section-solver-options:
 
 :class:`Solver::Options`
@@ -892,7 +1207,7 @@
    :class:`Solver::Options` controls the overall behavior of the
    solver. We list the various settings and their default values below.
 
-.. function:: bool Solver::Options::IsValid(string* error) const
+.. function:: bool Solver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -913,14 +1228,18 @@
    Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
    ``BFGS`` and ``LBFGS``.
 
+   See :ref:`section-line-search-methods` for more details.
+
 .. member:: LineSearchType Solver::Options::line_search_type
 
    Default: ``WOLFE``
 
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
-   ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   ``LBFGS`` line search direction algorithms to be satisfied, the
+   ``WOLFE`` line search must be used.
+
+   See :ref:`section-line-search-methods` for more details.
 
 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
 
@@ -931,26 +1250,28 @@
 
 .. member:: int Solver::Options::max_lbfgs_rank
 
-   Default: 20
+   Default: ``20``
 
-   The L-BFGS hessian approximation is a low rank approximation to the
-   inverse of the Hessian matrix. The rank of the approximation
-   determines (linearly) the space and time complexity of using the
-   approximation. Higher the rank, the better is the quality of the
-   approximation. The increase in quality is however is bounded for a
-   number of reasons.
+   The LBFGS hessian approximation is a low rank approximation to
+   the inverse of the Hessian matrix. The rank of the
+   approximation determines (linearly) the space and time
+   complexity of using the approximation. Higher the rank, the
+   better is the quality of the approximation. The increase in
+   quality is however is bounded for a number of reasons.
 
-     1. The method only uses secant information and not actual
-        derivatives.
+   1. The method only uses secant information and not actual
+   derivatives.
+   2. The Hessian approximation is constrained to be positive
+   definite.
 
-     2. The Hessian approximation is constrained to be positive
-        definite.
+   So increasing this rank to a large number will cost time and
+   space complexity without the corresponding increase in solution
+   quality. There are no hard and fast rules for choosing the
+   maximum rank. The best choice usually requires some problem
+   specific experimentation.
 
-   So increasing this rank to a large number will cost time and space
-   complexity without the corresponding increase in solution
-   quality. There are no hard and fast rules for choosing the maximum
-   rank. The best choice usually requires some problem specific
-   experimentation.
+   For more theoretical and implementation details of the LBFGS
+   method, please see [Nocedal]_.
 
 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
 
@@ -999,6 +1320,8 @@
 
 .. member:: double Solver::Options::min_line_search_step_size
 
+   Default: ``1e-9``
+
    The line search terminates if:
 
    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
@@ -1153,7 +1476,8 @@
 
 .. member:: double Solver::Options::max_solver_time_in_seconds
 
-   Default: ``1e6``
+   Default: ``1e9``
+
    Maximum amount of time for which the solver should run.
 
 .. member:: int Solver::Options::num_threads
@@ -1239,8 +1563,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the overall manifold associated with the
+   parameter vector.
 
 .. member:: double Solver::Options::parameter_tolerance
 
@@ -1260,7 +1584,7 @@
    Type of linear solver used to compute the solution to the linear
    least squares problem in each iteration of the Levenberg-Marquardt
    algorithm. If Ceres is built with support for ``SuiteSparse`` or
-   ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
+   ``Accelerate`` or ``Eigen``'s sparse Cholesky factorization, the
    default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
    otherwise.
 
@@ -1271,8 +1595,9 @@
    The preconditioner used by the iterative linear solver. The default
    is the block Jacobi preconditioner. Valid values are (in increasing
    order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
-   ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
-   :ref:`section-preconditioner` for more details.
+   ``CLUSTER_JACOBI``, ``CLUSTER_TRIDIAGONAL``, ``SUBSET`` and
+   ``SCHUR_POWER_SERIES_EXPANSION``. See :ref:`section-preconditioner`
+   for more details.
 
 .. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
 
@@ -1296,7 +1621,7 @@
    recommend that you try ``CANONICAL_VIEWS`` first and if it is too
    expensive try ``SINGLE_LINKAGE``.
 
-.. member:: std::unordered_set<ResidualBlockId> residual_blocks_for_subset_preconditioner
+.. member:: std::unordered_set<ResidualBlockId> Solver::Options::residual_blocks_for_subset_preconditioner
 
    ``SUBSET`` preconditioner is a preconditioner for problems with
    general sparsity. Given a subset of residual blocks of a problem,
@@ -1321,65 +1646,76 @@
 
 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
 
-   Default:``EIGEN``
+   Default: ``EIGEN``
 
    Ceres supports using multiple dense linear algebra libraries for
-   dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
-   the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
-   to the system ``BLAS + LAPACK`` library which may or may not be
-   available.
+   dense matrix factorizations. Currently ``EIGEN``, ``LAPACK`` and
+   ``CUDA`` are the valid choices. ``EIGEN`` is always available,
+   ``LAPACK`` refers to the system ``BLAS + LAPACK`` library which may
+   or may not be available. ``CUDA`` refers to Nvidia's GPU based
+   dense linear algebra library which may or may not be available.
 
    This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
-   and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
+   and ``DENSE_SCHUR`` solvers. For small to moderate sized problem
    ``EIGEN`` is a fine choice but for large problems, an optimized
-   ``LAPACK + BLAS`` implementation can make a substantial difference
-   in performance.
+   ``LAPACK + BLAS`` or ``CUDA`` implementation can make a substantial
+   difference in performance.
 
 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
 
    Default: The highest available according to: ``SUITE_SPARSE`` >
-   ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
+   ``ACCELERATE_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
 
    Ceres supports the use of three sparse linear algebra libraries,
    ``SuiteSparse``, which is enabled by setting this parameter to
-   ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
-   this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
-   setting this parameter to ``EIGEN_SPARSE``.  Lastly, ``NO_SPARSE``
-   means that no sparse linear solver should be used; note that this is
-   irrespective of whether Ceres was compiled with support for one.
+   ``SUITE_SPARSE``, ``Acclerate``, which can be selected by setting
+   this parameter to ``ACCELERATE_SPARSE`` and ``Eigen`` which is
+   enabled by setting this parameter to ``EIGEN_SPARSE``.  Lastly,
+   ``NO_SPARSE`` means that no sparse linear solver should be used;
+   note that this is irrespective of whether Ceres was compiled with
+   support for one.
 
-   ``SuiteSparse`` is a sophisticated and complex sparse linear
-   algebra library and should be used in general.
+   ``SuiteSparse`` is a sophisticated sparse linear algebra library
+   and should be used in general. On MacOS you may want to use the
+   ``Accelerate`` framework.
 
    If your needs/platforms prevent you from using ``SuiteSparse``,
-   consider using ``CXSparse``, which is a much smaller, easier to
-   build library. As can be expected, its performance on large
-   problems is not comparable to that of ``SuiteSparse``.
+   consider using the sparse linear algebra routines in ``Eigen``. The
+   sparse Cholesky algorithms currently included with ``Eigen`` are
+   not as sophisticated as the ones in ``SuiteSparse`` and
+   ``Accelerate`` and as a result its performance is considerably
+   worse.
 
-   Last but not the least you can use the sparse linear algebra
-   routines in ``Eigen``. Currently the performance of this library is
-   the poorest of the three. But this should change in the near
-   future.
+.. member:: LinearSolverOrderingType Solver::Options::linear_solver_ordering_type
 
-   Another thing to consider here is that the sparse Cholesky
-   factorization libraries in Eigen are licensed under ``LGPL`` and
-   building Ceres with support for ``EIGEN_SPARSE`` will result in an
-   LGPL licensed library (since the corresponding code from Eigen is
-   compiled into the library).
+   Default: ``AMD``
 
-   The upside is that you do not need to build and link to an external
-   library to use ``EIGEN_SPARSE``.
+    The order in which variables are eliminated in a linear solver can
+    have a significant impact on the efficiency and accuracy of the
+    method. e.g., when doing sparse Cholesky factorization, there are
+    matrices for which a good ordering will give a Cholesky factor
+    with :math:`O(n)` storage, where as a bad ordering will result in
+    an completely dense factor.
 
+    Sparse direct solvers like ``SPARSE_NORMAL_CHOLESKY`` and
+    ``SPARSE_SCHUR`` use a fill reducing ordering of the columns and
+    rows of the matrix being factorized before computing the numeric
+    factorization.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+    This enum controls the type of algorithm used to compute this fill
+    reducing ordering. There is no single algorithm that works on all
+    matrices, so determining which algorithm works better is a matter
+    of empirical experimentation.
 
-   Default: ``NULL``
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+
+   Default: ``nullptr``
 
    An instance of the ordering object informs the solver about the
    desired order in which parameter blocks should be eliminated by the
-   linear solvers. See section~\ref{sec:ordering`` for more details.
+   linear solvers.
 
-   If ``NULL``, the solver is free to choose an ordering that it
+   If ``nullptr``, the solver is free to choose an ordering that it
    thinks is best.
 
    See :ref:`section-ordering` for more details.
@@ -1404,41 +1740,15 @@
    efficient to explicitly compute it and use it for evaluating the
    matrix-vector products.
 
-   Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
-   computed Schur complement. This can improve the performance of the
-   ``ITERATIVE_SCHUR`` solver significantly.
-
-   .. NOTE:
+   .. NOTE::
 
      This option can only be used with the ``SCHUR_JACOBI``
      preconditioner.
 
-.. member:: bool Solver::Options::use_post_ordering
+.. member:: bool Solver::Options::dynamic_sparsity
 
    Default: ``false``
 
-   Sparse Cholesky factorization algorithms use a fill-reducing
-   ordering to permute the columns of the Jacobian matrix. There are
-   two ways of doing this.
-
-   1. Compute the Jacobian matrix in some order and then have the
-      factorization algorithm permute the columns of the Jacobian.
-
-   2. Compute the Jacobian with its columns already permuted.
-
-   The first option incurs a significant memory penalty. The
-   factorization algorithm has to make a copy of the permuted Jacobian
-   matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
-   and generally speaking, there is no performance penalty for doing
-   so.
-
-   In some rare cases, it is worth using a more complicated reordering
-   algorithm which has slightly better runtime performance at the
-   expense of an extra copy of the Jacobian matrix. Setting
-   ``use_postordering`` to ``true`` enables this tradeoff.
-
-.. member:: bool Solver::Options::dynamic_sparsity
-
    Some non-linear least squares problems are symbolically dense but
    numerically sparse. i.e. at any given state only a small number of
    Jacobian entries are non-zero, but the position and number of
@@ -1453,6 +1763,29 @@
 
    This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
 
+.. member:: bool Solver::Options::use_mixed_precision_solves
+
+   Default: ``false``
+
+   If true, the Gauss-Newton matrix is computed in *double* precision, but
+   its factorization is computed in **single** precision. This can result in
+   significant time and memory savings at the cost of some accuracy in the
+   Gauss-Newton step. Iterative refinement is used to recover some
+   of this accuracy back.
+
+   If ``use_mixed_precision_solves`` is true, we recommend setting
+   ``max_num_refinement_iterations`` to 2-3.
+
+   See :ref:`section-mixed-precision` for more details.
+
+.. member:: int Solver::Options::max_num_refinement_iterations
+
+   Default: ``0``
+
+   Number steps of the iterative refinement process to run when
+   computing the Gauss-Newton step, see
+   :member:`Solver::Options::use_mixed_precision_solves`.
+
 .. member:: int Solver::Options::min_linear_solver_iterations
 
    Default: ``0``
@@ -1469,6 +1802,34 @@
    makes sense when the linear solver is an iterative solver, e.g.,
    ``ITERATIVE_SCHUR`` or ``CGNR``.
 
+.. member:: int Solver::Options::max_num_spse_iterations
+
+   Default: `5`
+
+   Maximum number of iterations performed by
+   ``SCHUR_POWER_SERIES_EXPANSION``. Each iteration corresponds to one
+   more term in the power series expansion od the inverse of the Schur
+   complement.  This value controls the maximum number of iterations
+   whether it is used as a preconditioner or just to initialize the
+   solution for ``ITERATIVE_SCHUR``.
+
+.. member:: bool Solver:Options::use_spse_initialization
+
+   Default: ``false``
+
+   Use Schur power series expansion to initialize the solution for
+   ``ITERATIVE_SCHUR``. This option can be set ``true`` regardless of
+   what preconditioner is being used.
+
+.. member:: double Solver::Options::spse_tolerance
+
+   Default: `0.1`
+
+   When ``use_spse_initialization`` is ``true``, this parameter along
+   with ``max_num_spse_iterations`` controls the number of
+   ``SCHUR_POWER_SERIES_EXPANSION`` iterations performed for
+   initialization. It is not used to control the preconditioner.
+
 .. member:: double Solver::Options::eta
 
    Default: ``1e-1``
@@ -1502,6 +1863,25 @@
    objects that have an :class:`EvaluationCallback` associated with
    them.
 
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
+
+   Default: ``nullptr``
+
+   If :member:`Solver::Options::use_inner_iterations` true, then the
+   user has two choices.
+
+   1. Let the solver heuristically decide which parameter blocks to
+      optimize in each inner iteration. To do this, set
+      :member:`Solver::Options::inner_iteration_ordering` to ``nullptr``.
+
+   2. Specify a collection of of ordered independent sets. The lower
+      numbered groups are optimized before the higher number groups
+      during the inner optimization phase. Each group must be an
+      independent set. Not all parameter blocks need to be included in
+      the ordering.
+
+   See :ref:`section-ordering` for more details.
+
 .. member:: double Solver::Options::inner_iteration_tolerance
 
    Default: ``1e-3``
@@ -1516,37 +1896,21 @@
    inner iterations in subsequent trust region minimizer iterations is
    disabled.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
-
-   Default: ``NULL``
-
-   If :member:`Solver::Options::use_inner_iterations` true, then the
-   user has two choices.
-
-   1. Let the solver heuristically decide which parameter blocks to
-      optimize in each inner iteration. To do this, set
-      :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
-
-   2. Specify a collection of of ordered independent sets. The lower
-      numbered groups are optimized before the higher number groups
-      during the inner optimization phase. Each group must be an
-      independent set. Not all parameter blocks need to be included in
-      the ordering.
-
-   See :ref:`section-ordering` for more details.
 
 .. member:: LoggingType Solver::Options::logging_type
 
    Default: ``PER_MINIMIZER_ITERATION``
 
+   Valid values are ``SILENT`` and ``PER_MINIMIZER_ITERATION``.
+
 .. member:: bool Solver::Options::minimizer_progress_to_stdout
 
    Default: ``false``
 
-   By default the :class:`Minimizer` progress is logged to ``STDERR``
+   By default the Minimizer's progress is logged to ``STDERR``
    depending on the ``vlog`` level. If this flag is set to true, and
-   :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
-   output is sent to ``STDOUT``.
+   :member:`Solver::Options::logging_type` is not ``SILENT``, the
+   logging output is sent to ``STDOUT``.
 
    For ``TRUST_REGION_MINIMIZER`` the progress display looks like
 
@@ -1595,7 +1959,7 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
+.. member:: std::vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
 
    Default: ``empty``
 
@@ -1603,7 +1967,7 @@
    the trust region problem. Useful for testing and benchmarking. If
    ``empty``, no problems are dumped.
 
-.. member:: string Solver::Options::trust_region_problem_dump_directory
+.. member:: std::string Solver::Options::trust_region_problem_dump_directory
 
    Default: ``/tmp``
 
@@ -1614,7 +1978,7 @@
     :member:`Solver::Options::trust_region_problem_dump_format_type` is not
     ``CONSOLE``.
 
-.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
+.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format_type
 
    Default: ``TEXTFILE``
 
@@ -1708,23 +2072,32 @@
    should not expect to look at the parameter blocks and interpret
    their values.
 
-.. member:: vector<IterationCallback> Solver::Options::callbacks
+.. member:: std::vector<IterationCallback*> Solver::Options::callbacks
+
+   Default: ``empty``
 
    Callbacks that are executed at the end of each iteration of the
-   :class:`Minimizer`. They are executed in the order that they are
+   minimizer. They are executed in the order that they are
    specified in this vector.
 
    By default, parameter blocks are updated only at the end of the
-   optimization, i.e., when the :class:`Minimizer` terminates. This
-   means that by default, if an :class:`IterationCallback` inspects
-   the parameter blocks, they will not see them changing in the course
-   of the optimization.
+   optimization, i.e., when the minimizer terminates. This means that
+   by default, if an :class:`IterationCallback` inspects the parameter
+   blocks, they will not see them changing in the course of the
+   optimization.
 
    To tell Ceres to update the parameter blocks at the end of each
    iteration and before calling the user's callback, set
    :member:`Solver::Options::update_state_every_iteration` to
    ``true``.
 
+   See `examples/iteration_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+   for an example of an :class:`IterationCallback` that uses
+   :member:`Solver::Options::update_state_every_iteration` to log
+   changes to the parameter blocks over the course of the
+   optimization.
+
    The solver does NOT take ownership of these pointers.
 
 :class:`ParameterBlockOrdering`
@@ -1786,8 +2159,8 @@
 
    Number of groups with one or more elements.
 
-:class:`IterationCallback`
-==========================
+:class:`IterationSummary`
+=========================
 
 .. class:: IterationSummary
 
@@ -1863,7 +2236,7 @@
 
    Size of the trust region at the end of the current iteration. For
    the Levenberg-Marquardt algorithm, the regularization parameter is
-   1.0 / member::`IterationSummary::trust_region_radius`.
+   1.0 / :member:`IterationSummary::trust_region_radius`.
 
 .. member:: double IterationSummary::eta
 
@@ -1906,6 +2279,8 @@
 
    Time (in seconds) since the user called Solve().
 
+:class:`IterationCallback`
+==========================
 
 .. class:: IterationCallback
 
@@ -1939,6 +2314,10 @@
   #. ``SOLVER_CONTINUE`` indicates that the solver should continue
      optimizing.
 
+  The return values can be used to implement custom termination
+  criterion that supercede the iteration/time/tolerance based
+  termination implemented by Ceres.
+
   For example, the following :class:`IterationCallback` is used
   internally by Ceres to log the progress of the optimization.
 
@@ -1978,6 +2357,12 @@
     };
 
 
+  See `examples/evaluation_callback_example.cc
+  <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+  for another example that uses
+  :member:`Solver::Options::update_state_every_iteration` to log
+  changes to the parameter blocks over the course of the optimization.
+
 
 :class:`CRSMatrix`
 ==================
@@ -1995,13 +2380,13 @@
 
    Number of columns.
 
-.. member:: vector<int> CRSMatrix::rows
+.. member:: std::vector<int> CRSMatrix::rows
 
    :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
    sized array that points into the :member:`CRSMatrix::cols` and
    :member:`CRSMatrix::values` array.
 
-.. member:: vector<int> CRSMatrix::cols
+.. member:: std::vector<int> CRSMatrix::cols
 
    :member:`CRSMatrix::cols` contain as many entries as there are
    non-zeros in the matrix.
@@ -2009,7 +2394,7 @@
    For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
    are the indices of the non-zero columns of row ``i``.
 
-.. member:: vector<int> CRSMatrix::values
+.. member:: std::vector<double> CRSMatrix::values
 
    :member:`CRSMatrix::values` contain as many entries as there are
    non-zeros in the matrix.
@@ -2043,12 +2428,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string Solver::Summary::BriefReport() const
+.. function:: std::string Solver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string Solver::Summary::FullReport() const
+.. function:: std::string Solver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -2071,7 +2456,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string Solver::Summary::message
+.. member:: std::string Solver::Summary::message
 
    Reason why the solver terminated.
 
@@ -2091,14 +2476,14 @@
    were held fixed by the preprocessor because all the parameter
    blocks that they depend on were fixed.
 
-.. member:: vector<IterationSummary> Solver::Summary::iterations
+.. member:: std::vector<IterationSummary> Solver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
 .. member:: int Solver::Summary::num_successful_steps
 
    Number of minimizer iterations in which the step was
-   accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
+   accepted. Unless :member:`Solver::Options::use_nonmonotonic_steps`
    is `true` this is also the number of steps in which the objective
    function value/cost went down.
 
@@ -2112,13 +2497,13 @@
 
    Number of times inner iterations were performed.
 
- .. member:: int Solver::Summary::num_line_search_steps
+.. member:: int Solver::Summary::num_line_search_steps
 
-    Total number of iterations inside the line search algorithm across
-    all invocations. We call these iterations "steps" to distinguish
-    them from the outer iterations of the line search and trust region
-    minimizer algorithms which call the line search algorithm as a
-    subroutine.
+   Total number of iterations inside the line search algorithm across
+   all invocations. We call these iterations "steps" to distinguish
+   them from the outer iterations of the line search and trust region
+   minimizer algorithms which call the line search algorithm as a
+   subroutine.
 
 .. member:: double Solver::Summary::preprocessor_time_in_seconds
 
@@ -2126,7 +2511,7 @@
 
 .. member:: double Solver::Summary::minimizer_time_in_seconds
 
-   Time (in seconds) spent in the Minimizer.
+   Time (in seconds) spent in the minimizer.
 
 .. member:: double Solver::Summary::postprocessor_time_in_seconds
 
@@ -2180,7 +2565,7 @@
    Dimension of the tangent space of the problem (or the number of
    columns in the Jacobian for the problem). This is different from
    :member:`Solver::Summary::num_parameters` if a parameter block is
-   associated with a :class:`LocalParameterization`.
+   associated with a :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks
 
@@ -2206,7 +2591,7 @@
    number of columns in the Jacobian for the reduced problem). This is
    different from :member:`Solver::Summary::num_parameters_reduced` if
    a parameter block in the reduced problem is associated with a
-   :class:`LocalParameterization`.
+   :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks_reduced
 
@@ -2224,9 +2609,7 @@
 .. member:: int Solver::Summary::num_threads_used
 
    Number of threads actually used by the solver for Jacobian and
-   residual evaluation. This number is not equal to
-   :member:`Solver::Summary::num_threads_given` if none of `OpenMP`
-   or `CXX_THREADS` is available.
+   residual evaluation.
 
 .. member:: LinearSolverType Solver::Summary::linear_solver_type_given
 
@@ -2242,12 +2625,12 @@
    `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
    available.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_given
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_given
 
    Size of the elimination groups given by the user as hints to the
    linear solver.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_used
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_used
 
    Size of the parameter groups used by the solver when ordering the
    columns of the Jacobian.  This maybe different from
@@ -2285,12 +2668,12 @@
    actually performed. For example, in a problem with just one parameter
    block, inner iterations are not performed.
 
-.. member:: vector<int> inner_iteration_ordering_given
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_given
 
    Size of the parameter groups given by the user for performing inner
    iterations.
 
-.. member:: vector<int> inner_iteration_ordering_used
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_used
 
    Size of the parameter groups given used by the solver for
    performing inner iterations. This maybe different from
@@ -2315,7 +2698,7 @@
 
    Type of clustering algorithm used for visibility based
    preconditioning. Only meaningful when the
-   :member:`Solver::Summary::preconditioner_type` is
+   :member:`Solver::Summary::preconditioner_type_used` is
    ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
 
 .. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type