Squashed 'third_party/ceres/' changes from 399cda773..7f9cc571b 7f9cc571b Set CMAKE_CUDA_ARCHITECTURES depending on CUDAToolkit_VERSION 6a74af202 Remove a level of indirection when using CellInfo f8c2994da Drop Ubuntu 20.04 and add Ubuntu 24.04 support 20954e693 Eliminate CUDA set but unused variable warning 57aba3ed0 Enable Apple linker library deduplication 1f15197be Drop macos-11 runner and support macos-14 instead 5de0fda0f Update Github actions 308a5bb43 Add missing include 715865101 Remove 32-bit MinGW from CI matrix f71181a92 Remove remaining references to CXSparse 522210a08 Reuse macro to format version string cd2dd06e9 Fix clang16 compiler warnings 1f2e6313a Add missing std qualifiers 125c06882 Link static cuda libs when ceres is build static 62c03d6ff Revert "Update Eigen::BDCSVD usage to comply with Eigen 3.4" 4027f6997 Update Eigen::BDCSVD usage to comply with Eigen 3.4 1b2ebabf5 Typo fixes in the documentation 2ffeb943a Fix typo in AutoDiffManifold comment and docs da34da3da Remove CreateFakeBundleAdjustmentPartitionedJacobian 85b2c418a ClangTidy fixes 91773746b Simplify instantiation of cost functions and their functors 8b88a9ab4 Use C++17 Bessel functions 84436f791 Use native CMake TBB package configuration 556a56f21 Do not assume Homebrew usage 3fd2a72cc MinGW no longer provides a 32-bit METIS package e776a6f1a Unbreak the Bazel build. 095d48392 Skip structure detection on preprocessor failure a0876309a Fix typos in comments 774973731 Fix search on ceres-solver.org 85331393d Update docs for 2.2.0. 2120eae67 Optimize the computation of the LM diagonal in TinySolver 611b139b1 Fix Solver::Options::callbacks type in documentation b652d3b4f Unbreak the windows build b79c4d350 ClangTidy fixes 76af132d0 Update docs for 2.2.0RC3 dc7a85975 Single-threaded operations on small vectors b379ab768 Remove MaxNumThreadsAvailable 354002f98 Schedule task in ParallerFor from the previous task a9b3fcff4 Minor update to docs 5ccab18be Drop use of POSIX M_PI_2 and M_PI_4 4519b8d77 Drop use of POSIX M_PI b83abdcb1 Add a default value for Solver::Summary::linear_solver_ordering_type 8d875a312 Fix checks for CUDA memory pools support 94335e3b9 More ClangTidy fixes 399395c4f Miscellaneous ClangTidy fixes c8bed4b93 Update version_history for 2.2.0rc2 489339219 Rework MSVC warning suppression 0cea191d4 Move stream-ordered memory allocations 8e3b7d89e Fix a copy-pasta error dc0bb8508 Various cleanups to the documentation 4588b0fbb Add an example for EvaluationCallback 0fc3fd4fc Add documentation for examples e6b2f532b Parallelize PSE preconditioner 41672dff8 Add end-to-end BA tests for SCHUR_POWER_SERIES_EXPANSION 83ee376d8 Add an example for IterationCallback 371265094 Cleanup example code 59182a42c Update documentation d4db6e6fe Fix typos in the documentation for EvaluationCallback dffd8cd71 Add an accessor for the CostFunctor in DynamicAutoDiffCostFunction bea247701 Add a missing include dir to the cuda kernels target. 18ea7d1c2 Runtime check for cudaMallocAsync support a227045be Remove cuda-memcheck based tests d10e786ca Remove an unused variable from CudaSparseMatrix 5a30cae58 Preparing for 2.2.0rc1 9cca67127 Enable compatibility with SuiteSparse 7.2.0 a1c02e8d3 Rework the Sphinx find module a57e35bba Require at least CMake 3.16 863db948f Eliminate macOS sprintf warning 6f5342db6 Export ceres_cuda_kernels to project's build tree d864d146f Add macOS 13 runner to Github workflows 01a23504d Add a workaround for CMake Policy CMP0148 de9cbde95 Work around MinGW32 manifold_test segfault 5e4b22f7f Update CudaSparseMatrix class ed9921fc2 Fix Solver::Options in documentation de62bf220 Two minor fixes 5f97455be Fix typos in documentation 5ba62abec Add CUDA support to windows CI builds a98fdf582 Update CMakeLists.txt to fix Windows CUDA build ec4907399 Fix block-sparse to crs conversion on windows 799ee91bb Fix check in CompressedRowJacobianWriter::CreateJacobian() ee90f6cbf Detect large Jacobians and return failure instead of crashing. 310a252fb Deal with infinite initial cost correctly. 357482db7 Add NumTraits::max_digits10 for Jets 908b5b1b5 Fix type mismatch in documentation 75bacedf7 CUDA partitioned matrix view fd6197ce0 Fixed GCC 13.1 compilation errors df97a8f05 Improve support of older CUDA toolkit versions 085214ea7 Fix test CompressedRowSparseMatrix.Transpose d880df09f Match new[] with delete[] in BSM bdee4d617 Block-sparse to CRS conversion using block-structure 0f9de3daf Use page locked memory in BlockSparseMatrix e7bd72d41 Permutation-based conversion from block-sparse to crs abbc4e797 Explicitly compute number of non-zeros in row 96fdfd2e7 Implement tests for Euler conversion with jets db1ebd3ff Work around lack of constexpr constructors for Jet 16a4fa04e Further Jet conversion fixes 92ad18b8a Fix a Jet conversion bug in rotation.h a5e745d4e ClangTidy fixes 77ad8bb4e Change storage in BlockRandomAccessSparseMatrix d340f81bd Clang Tidy fixes 54ad3dd03 Reorganize ParallelFor source files ba360ab07 Change the value of BlockRandomAccessSparseMatrix::kMaxRowBlocks 0315c6ca9 Provide DynamicAutoDiffCostFunction deduction guide 3cdfae110 Replace Hertzberg mentions with citations 0af38a9fc Fix typos in documentation 4c969a6c1 Improve image of loss functions shape b54f05b8e Add missing TukeyLoss to documentation 8bf4a2f42 Inexact check for ParallelAssign test 9cddce73a Explicit conversions from long to int in benchmarks (for num_threads) 5e787ab70 Using const ints in jacobian writers f9bffbb6f Removing -Wshorten-64-to-32 warnings from examples (part 1) e269b64f5 More ClangTidy fixes f4eb768e0 Using int64 in file.cc. Fixing compilation error in array_utils.cc 74a0f0d24 Using int64_t for sizes and indexes in array utils 749a442d9 Clang-Tidy fixes c4ba975ae Fix rotation_test.cc to work with older versions of Eigen 9602ed7b7 ClangFormat changes 79a554ffc Fix a bug in QuaternionRotatePoint. f1113c08a Commenting unused parameters for better readibility 772d927e1 Replacing old style typedefs with new style usings 53df5ddcf Removing using std::... e1ca3302a Increasing bazel timeout for heavy BA tests f982d3071 Fixing bazel build cb6b30662 Use hypot to compute the L^2 norm 1e2a24a8b Update Github actions to avoid deprecation warnings 285e5f9f4 Do not update brew formulae upon install 73d95b03f Clang-Tidy fixes 51d52c3ea Correct epsilon in CUDA QR test changed by last commit. 546f5337b Updates to CUDA dense linear algebra tests 19ab2c179 BlockRandomAccessMatrix Refactor a3a062d72 Add a missing header 2b88bedb2 Remove unused variables 9beea728f Fix a bug in CoordinateDescentMinimizer 8e5d83f07 ClangFormat and ClangTidy changes b15851508 Parallel operations on vectors 2fd81de12 Add build configuration with CUDA on Linux 06bfe6ffa Remove OpenMP and No threading backends. 352b320ab Fixed SuiteSparse 6.0 version parsing 8fd7828e3 ClangTidy fixes 0424615dc Fix PartitionedMatrixView usage in evaluation_benchmark 77a54dd3d Parallel updates to block-diagonal EtE FtF 946fa50de ClangTidy fixes e4bef9505 Refactor PartitionedMatrixView to cache the partitions d3201798e Clean up sparse_cholesky_test addcd342f ClangTidy fixes c2e7002d2 Remove an unused variable from evaluation_benchmark.cc fef6d5875 Parallel left products for PartitionedMatrixView 37a3cb384 Update SuiteSparse in MSVC Github workflow 5d53d1ee3 Parallel for with iteration costs and left product 9aa52c6ff Use FindCUDAToolkit for CMake >= 3.17 47e03a6d8 Add const accessor for Problem::Options used by Problem d6a931009 Clang Tidy Fixes 9364e31ee Fix a regression in SuiteSparse::AnalyzeCholesky 89b3e1f88 Remove unused includes of gflags and gtest 984079003 Fix missing regex dependency for gtest on QNX 6b296f27f Fix missing namespace qualification and docs for Manifold gtest macro 6685e629f AddBlockStructureTranspose to BlockSparseMatrix 699e3f3b3 Fix a link error in evaluation_benchmark.cc 19a3d07f9 Add multiplication benchmarks on BAL data b221b1294 Format code with clang-format. ccf32d70c Purge all remaining references to (defunct) LocalParameterization 5f8c406e2 struct ContextImpl -> class ContextImpl 9893c534c Several cleanups. a78a57472 ClangTidy fixes b1fe60330 Parallel right products for partitioned view 16668eedf Fix a memory leak in ContextImpl d129938d5 Third time is the charm 4129b214a More fixes to cuda_dense_cholesky_test.cc 5c01d2573 Remove unused variables from cuda_dense_cholesky_test.cc 5e877ae69 Fix the Bazel build d89290ffa Fix evalution_benchmark compilability ae7f456e3 ClangTidy fixes 9438c370f Restore the semantics of TrustRegionMinimizer c964fce90 Fix bug in cuda_kernels_test afaad5678 Fix a typo b7116824b Evaluation benchmark 2b89ce66f Add generalized Euler Angle conversions 8230edc6c ClangTidy fixes 9a2894763 Speed up locking when num_threads = 1. 739f2a25a Parallelize block_jacobi_preconditioner c0c4f9394 Change implementation of parallel for fc826c578 CUDA Cleanup 660af905f Fix a bug in TrustRegionMinimizer. 4cd257cf4 Let NumericDiffFirstOrderFunction take a dynamically sized parameter vector 6c27ac6d5 Fix repeated SpMV Benchmark 430a292ac Add a missing include 858b4b89b More ClangTidy fixes e9e995740 ClangTidy fixes 7f8e930a0 Fix lint errors f86a3bdbe Unify Block handling across matrix types 5f1946879 clang-formated source 00a05cf70 CUDA SDK Version-based SpMV Selection de0f74e40 Optimize the BlockCRSJacobiPreconditioner ba65ddd31 Improvements to benchmarks 42352e2e2 Added CUDA Jacobi Preconditioner f802a09ff &foo[0] -> foo.data() 344929647 Add CUDA GPU and Runtime Detection 6085e45be Minor CUDA cleanup. e15ec89f3 Speed up bundle_adjuster 058a72782 One more ClangTidy fix. f6f2f0d16 ClangTidy cleanups Also some clang-format cleanups. 9d74c6913 Fix some more errant CATD warnings a4f744095 Remove an unused variable from compressed_row_sparse_matrix.cc 388c14286 Fix GCC 12.1.1 LTO -Walloc-size-larger-than= warnings 9ec4f7e44 Refactor BlockJacobiPreconditioner adda97acd Fixed a few more missing CERES_NO_CUDA guards 22aeb3584 Fix a missing string assignment in solver.cc 3b891f767 Insert missing CUDA guards. 829089053 CUDA CGNR, Part 4: CudaCgnrSolver 6ab435d77 Fix a missing CERES_NO_CUDA guard c560bc2be CUDA CGNR, Part 3: CudaSparseMatrix c914c7a2b CUDA CGNR, Part 2: CudaVector 3af3dee18 Simplify the implementation to convert from BlockSparseMatrix to CompressedRowSparseMatrix. 242fc0795 Remove unnecessary destructors 737200ac8 Add macos-12 to Github workflow runners 9c968d406 Silence Clang warning 2c78c5f33 Small naming fixups. 2a25d86b0 Integrate schur power series expansion options to bundle adjuster 4642e4b0c Use if constexpr and map SparseMatrix as const 92d837953 Enable usage of schur power series expansion preconditioner. f1dfac8cd Reduce the number of individual PRNG instances 79e403b15 Expand vcpkg installation instructions 7b0bb0e3f ClangTidy cleanups c5c2afcc9 Fix solver_test.cc for preconditioners and sparse linear algebra libraries 07d333fb6 Refactor options checking for linear solvers ba7207b0b A number of small changes. 20e85bbe3 Add power series expansion preconditioner 04899645c LinearOperator::FooMultiply -> LinearOperator::FooMultiplyAndAccumulate 288a3fde6 Add missing virtual destructors to matrix adapters 6483a2b4c Add Sergiu's name to the list of maintainers 1cf49d688 Update FindGlog.cmake to create glog::glog target 1da72ac39 Refactor ConjugateGradientsSolver f62dccdb3 Fix the Sphere and Line Manifold formulations 3e1cc89f6 A bunch of clang-tidy fixes. 80380538a One more CATD fix 560ef46fb A bunch of minor fixes. 67bae28c1 CUDA CGNR, Part 1: Misc. CLeanup 5d0bca14d Remove ceres/internal/random.h in favor of <random> d881b5ccf Minor fixes in comments 37516c968 Fix a bug in InnerProductComputer. d8dad14ee CUDA Cleanup c9d2ec8a9 Updates to sparse block matrix structures to support new sparse linear solvers. 5fe0bd45a Added MinGW to Windows Github workflow 738c027c1 Fix a logic error in iterative_refiner_test cb6ad463d Add mixed precision support for CPU based DenseCholesky df55682ba Fix Eigen error in 2D sphere manifolds Since Eigen does not allow to have a RowMajor column vector (see https://gitlab.com/libeigen/eigen/-/issues/416), the storage order must be set to ColMajor in that case. This fix adds that special case when generating 2D sphere manifolds. 1cf59f61e Set Github workflow NDK path explicitly 68c53bb39 Remove ceres::LocalParameterization 2f660464c Fix build issue with CUDA testing targets when compiling without gflags. c801192d4 Minor fixes ce9e902b8 Fix missing CERES_METIS_VERSION d9a3dfbf2 Add a missing ifdef guard to dense_cholesky_test 5bd43a1fa Speed up DenseSparseMatrix::SquareColumnNorm. cbc86f651 Fix the build when CUDA is not present 5af8e6449 Update year in solver.h 88e08cfe7 Mixed-precision Iterative Refinement Cholesky With CUDA 290b34ef0 Fix optional SuiteSparse + METIS test-suite names to be unique d038e2d83 Fix use of NESDIS with SuiteSparse in tests if METIS is not found 027e741a1 Eliminated MinGW warning 4e5ea292b Fixed MSVC 2022 warning 83f6e0853 Fix use of conditional preprocessor checks within a macro in tests 70f1aac31 Fix fmin/fmax() when using Jets with float as their scalar type 5de77f399 Fix reporting of METIS version 11e637667 Fix #ifdef guards around METIS usage in EigenSparse backend and tests 0c88301e6 Provide optional METIS support f11c25626 Fix fmin/fmax() to use Jet averaging on equality b90053f1a Revert C++17 usage of std::exclusive_scan dfce1e128 Link against threading library only if necessary 69eddfb6d Use find module to link against OpenMP b4803778c Update documentation for linear_solver_ordering_type 2e764df06 Update Cuda memcheck test 443ae9ce2 Update Cuda memcheck test 55b4c3f44 Retain terminal formatting when building docs 786866d9f Generate version string at compile time 5bd83c4ac Unbreak the build with EIGENSPARSE is disabled 2335b5b4b Remove support for CXSparse fbc2eea16 Nested dissection for ACCELERATE_SPARSE & EIGEN_SPARSE d87fd551b Fix Ubuntu 20.04 workflow tests 71717f37c Use glog 0.6 release to run Windows Github workflow 66e0adfa7 Fix detection of sphinx-rtd-theme d09f7e9d5 Enable postordering when computing the sparse factorization. 9b34ecef1 Unbreak the build on MacOS 8ba8fbb17 Remove Solver::Options::use_postordering 30b4d5df3 Fix the ceres.bzl to add missing cc files. 39ec5e8f9 Add Nested Dissection based fill reducing ordering aa62dd86a Fix a build breakage 41c5fb1e8 Refactor suitesparse.h/cc 12263e283 Make the min. required version of SuiteSparse to be 4.5.6 c8493fc36 Convert internal enums to be class enums. bb3a40c09 Add Nested Dissection ordering method to SuiteSparse f1414cb5b Correct spelling in comments and docs. fd2b0ceed Correct spelling (contiguous, BANS) 464abc198 Run Linux Github workflow on Ubuntu 22.04 caf614a6c Modernize code using c++17 constructs be618133e Simplify some template metaprograms using fold expressions. 3b0096c1b Add the ability to specify the pivot threshold in Covariance::Options 40c1a7e18 Fix Github workflows 127474360 Ceres Solver now requires C++17 b5f1b7877 clang-format cleanup 32cd1115c Make the code in small_blas_generic.h more compiler friendly. f68321e7d Update version history b34280207 Fix MSVC small_blas_test failures b246991b6 Update the citation instructions in the docs c0c14abca Fix version history item numbering d23dbac25 Update Windows install guide e669c9fc7 Provide citation file ff57c2e91 Update version history for 2.1.0rc2 ab9436cb9 Workaround MSVC STL deficiency in C++17 mode 97c232857 Update the included gtest to version 1.11.0 4eac7ddd2 Fix Jet lerp test regression 2ffbe126d Fix Jet test failures on ARMv8 with recent Xcode 0d6a0292c Fix unused arguments of Make1stOrderPerturbation 93511bfdc Fix SuiteSparse path and version reporting bf329b30f Fix link to macOS badge 0133dada2 Add Github workflows 3d3d6ed71 Add missing includes 0a9c0df8a Fix path for cuda-memcheck tests ee35ef66f ClangFormat cleanup via scripts/all_format.sh 470515985 Add missing includes for config.h d3612c12c Set CMP0057 policy for IN_LIST operator in FindSuiteSparse.cmake 4bc100c13 Do not define unusable import targets e91995cce Fix Ubuntu 18.04 shared library build 94af09186 Force C++ linker a65e73885 Update installation docs 1a377d707 Fix Ubuntu 20.04 SPQR build 817f5a068 Switch to imported SuiteSparse, CXSparse, and METIS targets b0f32a20d Hide remaining internal symbols 572395098 Add a missing include b0aef211d Allow to store pointers in ProductManifold 9afe8cc45 Small compile fix to context_impl 284be88ca Allow ProductManifold default construction f59059fff Bugfix to CUDA workspace handling 779634164 Fix MSVC linker error 7743d2e73 Store ProductManifold instances in a tuple 9f32c42ba Update Travis-CI status badge to .com from .org eadfead69 Move LineManifold and SphereManifold into their own headers. f0f8f93bb Fix docs inconsistencies e40391efa Update version history in preparation for 2.1.0 6a37fbf9b Add static/compile time sizing to EuclideanManifold 4ad787ce1 Fix the bazel build ae4d95df6 Two small clang-tidy fixes f0851667b Fix MSVC compilation errors c8658c899 Modernize more 46b3495a4 Standardize path handling using GNUInstallDirs 98bc3ca17 Fix shared library build due to missing compile features specification 8fe8ebc3a Add final specifier to public classes 84e1696f4 Add final specifier to internal classes. 518970f81 Context should be exported 09ec4997f Cleanup examples 90e58e10f Add missing #include. 15348abe9 Add CUDA based bundle adjustment tests. 57ec9dc92 Do not enforce a specific C++ standard 99698f053 Fix Apple Clang weak symbols warnings 8e0842162 Add support for dense CUDA solvers #3 aff51c907 Revert "Do not enforce a specific C++ standard" 527c3f7da Fixed gflags dependency d839b7792 Do not enforce a specific C++ standard f71167c62 Fixed missing include in minilog build 47502b833 Miscellaneous CUDA related changes. bb2996681 Check CUDA is available in solver.cc 7d2e4152e Add support for dense CUDA solvers #2 f90833f5f Simplify symbol export c6158e0ab Replace NULL by nullptr 7e4f5a51b Remove blas.h/cc as they are not used anymore. e0fef6ef0 Add cmake option ENABLE_BITCODE for iOS builds 446487c54 Add <memory> header to all files using std::unique_ptr. 9c5f29d46 Use compiler attributes instead of [[deprecated]] 44039af2c Convert factory functions to return std::unique_ptrs. 708a2a723 Silence LocalParameterization deprecation warnings de69e657a Fix another missing declaration warning 677711138 Fix segmentation fault in AVX2 builds 0141ca090 Deprecate LocalParameterizations fdfa5184a Fix missing declaration warning 4742bf386 Add LineManifold. c14f360e6 Drop trivial special members ae65219e0 ClangTidy cleanups a35bd1bf9 Use = default for trivial special members db67e621e Fix dense_cholesky_test.cc comma handling. 484d3414e Replace virtual keyword by override 2092a720e Fix some nits. 36d6d8690 Add support for dense CUDA solvers #1 af5e48c71 Add SphereManifold. 182cb01c5 Normalize Jet classification and comparison 9dbd28989 Loosen tolerances in dense_qr_test.cc cab853fd5 Add DenseQR Interface 8ae054ad9 Fix missing declaration warning in autodiff_manifold_test 408af7b1a Move the constructor and destructor for SchurComplementSolver d51672d1c Move the constructor and destructor for DenseSchurComplementSolver 177b2f99d Use benchmark version 1.6 compatible syntax. ce9669003 Add const accessor for functor wrapped by auto/numeric-diff objects 9367ec9cc Add a benchmark for dense linear solvers. 7d6524daf Support fma Jet a0d81ad63 Fix a bug in AutoDiffManifold 5a99e42e1 ClangTidy fixes 40fb41355 Update .gitignore e6e6ae087 Unbreak the bazel build 0572efc57 Fix a compilation warning in autodiff_manifold.h 475db73d0 Fix build breakage when LAPACK support is disabled. 095c9197f Fix iterative_refiner_test.cc 6d06e9b98 Add DenseCholesky 77c0c4d09 Migrate examples to use Manifolds 19eef54fc Rename Quaternion to QuaternionManifold. Also rename EigenQuaternion to EigenQuaternionManiold. ca6d841c2 Add AutoDiffManifold 97d7e0737 Move the manifold testing matchers to manifold_test_utils.h 16436b34b Fix some more clang-tidy suggestions. dcdefc216 Fix a bunch of clang-tidy suggestions. d8a1b69ab Remove an unused variable from gradient_checker.cc 611b46b54 Remove the use of CHECK_NOTNULL. 125a0e9be LocalParameterization -> Manifold #1 00bfbae11 Add missing algorithm header to manifold.cc 1d5aff059 Refactor jet_tests.cc fbd693091 Fix two unused variable warnings. c0cb42e5f Add Problem::HasParameterization 7e2f9d9d4 Add EigenQuaternion manifold and tests for it. b81a8bbb7 Add the Quaternion manifold and tests. 4a01dcb88 Add more invariants and documentation to manifold_test.cc bdd80fcce Improve Manifold testing ce1537030 Fixed missing headers in manifold.h 23b204d7e LocalParameterization -> Manifold #1 c2fab6502 Fix docs of supported sparse backends for mixed_precision_solves option 8cb441c49 Fix missing declaration warnings in GCC d2b7f337c Remove split.cc from the bazel sources. 206061a6b Use standard c++ types in jet_test.cc 1f374a9fe Support promotion in comparison between Jet and scalars 06e68dbc5 Avoid midpoint overflow in the differential 276d24c73 Fix C++20 compilation b1391e062 Support midpoint Jet 8426526df Support lerp Jet 57c279689 support 3-argument hypot jet 123fba61c Eigen::MappedSparseMatrix -> Eigen::Map<Eigen::SparseMatrix> 3f950c66d reworked copysign tests 48cb54d1b fix fmin and fmax NaN handling 552a4e517 support log10 jet 4e49c5422 reuse expm1 result for differential 8d3e64dd5 Use modern-style Eigen3 CMake variables 2fba61434 support log1p and expm1 jet a668cabbc support norm jet a3a4b6d77 support copysign jet b75dac169 fix abs jet test comment 034bf566f use copysign for abs jet 31008453f Add example for BiCubicInterpolator 7ef4a1221 Add a section on implicit and inverse function theorems 686428f5c Move the further reading section to bibliography e47d87fdd Add a note about Trigg's correction d2852518d Fix the docs for Problem::RemoveResidualBlock & Problem::RemoveParameterBlock 06e02a173 Delete unused files split.h/cc ac7268da5 Fix an 80cols issue in covariance_impl.cc 17dccef91 Add NumericDiffFirstOrderFunction 03d64141a Fix a incorrect check in reorder_program.cc 884111913 Two changes to TinySolver 4dff3ea2c Fix a number of typos in rotation.h 98719ced4 Fix a type in interfacing_with_autodiff.html 0299ce944 Update conf.py to be compatible with Sphinx 4.1.2 2a2b9bd6f Fix a bug in covariance_impl.cc 27fade7b8 Fix a bug in system_test.cc d4eb83ee5 Fix the Jacobian in trust_region_minimizer_test.cc 5f6071a1c Fix a bug in local_parameterization_test.cc b2e732b1e Fix errors in comments from William Gandler. 42f1d6717 Add accessors to GradientProblem aefd37b18 Refactor small_blas_gemm_benchmark dc20db303 [docs] Fix `IterationSummary` section. Add missing `IterationCallback` 90ba7d1ef [docs] Fix typos c3129c3d4 Fix tests not executing 7de561e8e Fix dependency check for building documentation 4fbe218f2 Refactor small_blas_test 3fdde8ede Remove an errant double link. 20ad431f7 Fixing a typo in the version history 0c85c4092 Revert "Reduce copies involved in Jet operations" 3a02d5aaf Fix typo in LossFunctionWrapper sample code 7b2c223be Add fmax/fmin overloads for scalars c036c7819 Reduce copies involved in Jet operations 51945e061 Introduce benchmark for Jet operations ec4f2995b Do not check MaxNumThreadsAvailable if the thread number is set to 1. 98f639f54 Add a macro CERES_GET_FLAG. 766f2cab5 Reduce log spam in covariance_impl.cc. 941ea1347 Fix FindTBB version detection with TBB >= 2021.1.1 323c350a6 fix Eigen3_VERSION 2b32b3212 Revert "Group specializations into groups of four" 313caf1ae Allow Unity build. 4ba244cdb Group specializations into groups of four d77a8100a Make miniglog's InitGoogleLogging argument const. 863724994 Use portable expression for constant 2/sqrt(pi) 97873ea65 Add some missing includes for glog/logging.h d15b1bcd3 Increase tolerance in small_blas_test.cc 17cf01831 Hide 'format not a string literal' error in examples 64029909b Fix -Wno-maybe-uninitialized error 21294123d Fix nonnull arg compared to NULL error. 1dd417410 Fix -Wno-format-nonliteral 6c106bf51 Fix -Wmissing-field-initializers error c48a32792 Use cc_binary includes so examples build as external repo e0e14a5cd Fix errors found by -Werror e84cf10e1 Fix an errant double in TinySolver. 66b4c33e8 updated unit quaternion rotation d45ec47b5 Fix a typo in schur_eliminator.h Change-Id: I60db062e44d051d50dbb3a145eec2f74d5190481 git-subtree-dir: third_party/ceres git-subtree-split: 7f9cc571b03632f1df93ea35725a1f5dfffe2c72 Signed-off-by: Austin Schuh <austin.linux@gmail.com>

commit: 3de38b02fbcca083f00ae6ced0ab9e7716b2eefb [log] [tgz]
author: Austin Schuh <austin.linux@gmail.com> Tue Jun 25 18:25:10 2024 -0700
committer: Austin Schuh <austin.linux@gmail.com> Tue Jun 25 18:25:10 2024 -0700
tree: 13aaecee97392759055000c7eca3617ea4d8ef7f
parent: 1d1e6ea29cee3c5d86e10d0fe58c78b155174640 [diff] [blame]
diff --git a/docs/source/nnls_solving.rst b/docs/source/nnls_solving.rst
index 285df3a..184b94b 100644
--- a/docs/source/nnls_solving.rst
+++ b/docs/source/nnls_solving.rst

@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -12,10 +13,10 @@
 Introduction
 ============
 
-Effective use of Ceres requires some familiarity with the basic
+Effective use of Ceres Solver requires some familiarity with the basic
 components of a non-linear least squares solver, so before we describe
 how to configure and use the solver, we will take a brief look at how
-some of the core optimization algorithms in Ceres work.
+some of the core optimization algorithms in Ceres Solver work.
 
 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
 variables, and
@@ -27,15 +28,15 @@
           L \le x \le U
   :label: nonlinsq
 
-Where, :math:`L` and :math:`U` are lower and upper bounds on the
-parameter vector :math:`x`.
+Where, :math:`L` and :math:`U` are vector lower and upper bounds on
+the parameter vector :math:`x`. The inequality holds component-wise.
 
 Since the efficient global minimization of :eq:`nonlinsq` for
 general :math:`F(x)` is an intractable problem, we will have to settle
 for finding a local minimum.
 
 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
-:math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
+:math:`m\times n` matrix, where :math:`J_{ij}(x) = D_j f_i(x)`
 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
 = J(x)^\top F(x)`.
 
@@ -75,7 +76,7 @@
 Trust region methods are in some sense dual to line search methods:
 trust region methods first choose a step size (the size of the trust
 region) and then a step direction while line search methods first
-choose a step direction and then a step size. Ceres implements
+choose a step direction and then a step size. Ceres Solver implements
 multiple algorithms in both categories.
 
 .. _section-trust-region-methods:
@@ -152,8 +153,9 @@
 solving an unconstrained optimization of the form
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
+   :label: lsqr-naive
 
-Where, :math:`\lambda` is a Lagrange multiplier that is inverse
+Where, :math:`\lambda` is a Lagrange multiplier that is inversely
 related to :math:`\mu`. In Ceres, we solve for
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
@@ -162,39 +164,47 @@
 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
 
-Before going further, let us make some notational simplifications. We
-will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been concatenated
-at the bottom of the matrix :math:`J` and similarly a vector of zeros
-has been added to the bottom of the vector :math:`f` and the rest of
-our discussion will be in terms of :math:`J` and :math:`F`, i.e, the
-linear least squares problem.
+Before going further, let us make some notational simplifications.
+
+We will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been
+concatenated at the bottom of the matrix :math:`J(x)` and a
+corresponding vector of zeroes has been added to the bottom of
+:math:`F(x)`, i.e.:
+
+.. math:: J(x) = \begin{bmatrix} J(x) \\ \frac{1}{\sqrt{\mu}} D
+          \end{bmatrix},\quad F(x) = \begin{bmatrix} F(x) \\ 0
+          \end{bmatrix}.
+
+This allows us to re-write :eq:`lsqr` as
 
 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple
 
-For all but the smallest problems the solution of :eq:`simple` in
-each iteration of the Levenberg-Marquardt algorithm is the dominant
-computational cost in Ceres. Ceres provides a number of different
-options for solving :eq:`simple`. There are two major classes of
-methods - factorization and iterative.
+and only talk about :math:`J(x)` and :math:`F(x)` going forward.
+
+For all but the smallest problems the solution of :eq:`simple` in each
+iteration of the Levenberg-Marquardt algorithm is the dominant
+computational cost. Ceres provides a number of different options for
+solving :eq:`simple`. There are two major classes of methods -
+factorization and iterative.
 
 The factorization methods are based on computing an exact solution of
-:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
-step Levenberg-Marquardt algorithm. But it is not clear if an exact
-solution of :eq:`lsqr` is necessary at each step of the LM algorithm
-to solve :eq:`nonlinsq`. In fact, we have already seen evidence
-that this may not be the case, as :eq:`lsqr` is itself a regularized
-version of :eq:`linearapprox`. Indeed, it is possible to
-construct non-linear optimization algorithms in which the linearized
-problem is solved approximately. These algorithms are known as inexact
-Newton or truncated Newton methods [NocedalWright]_.
+:eq:`lsqr` using a Cholesky or a QR factorization and lead to the so
+called exact step Levenberg-Marquardt algorithm. But it is not clear
+if an exact solution of :eq:`lsqr` is necessary at each step of the
+Levenberg-Mardquardt algorithm.  We have already seen evidence that
+this may not be the case, as :eq:`lsqr` is itself a regularized
+version of :eq:`linearapprox`. Indeed, it is possible to construct
+non-linear optimization algorithms in which the linearized problem is
+solved approximately. These algorithms are known as inexact Newton or
+truncated Newton methods [NocedalWright]_.
 
 An inexact Newton method requires two ingredients. First, a cheap
 method for approximately solving systems of linear
 equations. Typically an iterative linear solver like the Conjugate
-Gradients method is used for this
-purpose [NocedalWright]_. Second, a termination rule for
-the iterative solver. A typical termination rule is of the form
+Gradients method is used for this purpose [NocedalWright]_. Second, a
+termination rule for the iterative solver. A typical termination rule
+is of the form
 
 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
    :label: inexact
@@ -212,14 +222,18 @@
 iterative linear solver, the inexact step Levenberg-Marquardt
 algorithm is used.
 
+We will talk more about the various linear solvers that you can use in
+:ref:`section-linear-solver`.
+
 .. _section-dogleg:
 
 Dogleg
 ------
 
 Another strategy for solving the trust region problem :eq:`trp` was
-introduced by M. J. D. Powell. The key idea there is to compute two
-vectors
+introduced by
+`M. J. D. Powell <https://en.wikipedia.org/wiki/Michael_J._D._Powell>`_. The
+key idea there is to compute two vectors
 
 .. math::
 
@@ -253,10 +267,14 @@
 Levenberg-Marquardt solves the linear approximation from scratch with
 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
 to compute the interpolation between the Gauss-Newton and the Cauchy
-vectors, as neither of them depend on the value of :math:`\mu`.
+vectors, as neither of them depend on the value of :math:`\mu`. As a
+result the Dogleg method only solves one linear system per successful
+step, while Levenberg-Marquardt may need to solve an arbitrary number
+of linear systems before it can make progress [LourakisArgyros]_.
 
-The Dogleg method can only be used with the exact factorization based
-linear solvers.
+A disadvantage of the Dogleg implementation in Ceres Solver is that is
+can only be used with method can only be used with exact factorization
+based linear solvers.
 
 .. _section-inner-iterations:
 
@@ -349,10 +367,10 @@
 enables the non-monotonic trust region algorithm as described by Conn,
 Gould & Toint in [Conn]_.
 
-Even though the value of the objective function may be larger
-than the minimum value encountered over the course of the
-optimization, the final parameters returned to the user are the
-ones corresponding to the minimum cost over all iterations.
+Even though the value of the objective function may be larger than the
+minimum value encountered over the course of the optimization, the
+final parameters returned to the user are the ones corresponding to
+the minimum cost over all iterations.
 
 The option to take non-monotonic steps is available for all trust
 region strategies.
@@ -363,11 +381,13 @@
 Line Search Methods
 ===================
 
-The line search method in Ceres Solver cannot handle bounds
-constraints right now, so it can only be used for solving
-unconstrained problems.
+.. NOTE::
 
-Line search algorithms
+   The line search method in Ceres Solver cannot handle bounds
+   constraints right now, so it can only be used for solving
+   unconstrained problems.
+
+The basic line search algorithm looks something like this:
 
    1. Given an initial point :math:`x`
    2. :math:`\Delta x = -H^{-1}(x) g(x)`
@@ -387,7 +407,7 @@
 direction :math:`\Delta x` and the method used for one dimensional
 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
 primary source of computational complexity in these
-methods. Currently, Ceres Solver supports three choices of search
+methods. Currently, Ceres Solver supports four choices of search
 directions, all aimed at large scale problems.
 
 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
@@ -405,8 +425,8 @@
 3. ``BFGS`` A generalization of the Secant method to multiple
    dimensions in which a full, dense approximation to the inverse
    Hessian is maintained and used to compute a quasi-Newton step
-   [NocedalWright]_.  BFGS is currently the best known general
-   quasi-Newton algorithm.
+   [NocedalWright]_.  ``BFGS`` and its limited memory variant ``LBFGS``
+   are currently the best known general quasi-Newton algorithm.
 
 4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
    method in which the last `M` iterations are used to approximate the
@@ -414,26 +434,31 @@
    [ByrdNocedal]_.
 
 Currently Ceres Solver supports both a backtracking and interpolation
-based Armijo line search algorithm, and a sectioning / zoom
-interpolation (strong) Wolfe condition line search algorithm.
-However, note that in order for the assumptions underlying the
-``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
-Wolfe line search algorithm should be used.
+based `Armijo line search algorithm
+<https://en.wikipedia.org/wiki/Backtracking_line_search>`_ (``ARMIJO``)
+, and a sectioning / zoom interpolation (strong) `Wolfe condition line
+search algorithm <https://en.wikipedia.org/wiki/Wolfe_conditions>`_
+(``WOLFE``).
+
+.. NOTE::
+
+   In order for the assumptions underlying the ``BFGS`` and ``LBFGS``
+   methods to be satisfied the ``WOLFE`` algorithm must be used.
 
 .. _section-linear-solver:
 
-LinearSolver
-============
+Linear Solvers
+==============
 
-Recall that in both of the trust-region methods described above, the
+Observe that for both of the trust-region methods described above, the
 key computational cost is the solution of a linear least squares
 problem of the form
 
-.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
+.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple2
 
 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
-f(x)`. For notational convenience let us also drop the dependence on
+F(x)`. For notational convenience let us also drop the dependence on
 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
 equivalent to solving the *normal equations*.
 
@@ -444,31 +469,65 @@
 
 .. _section-qr:
 
-``DENSE_QR``
-------------
+DENSE_QR
+--------
 
 For small problems (a couple of hundred parameters and a few thousand
-residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
-of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
-:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
-an upper triangular matrix [TrefethenBau]_. Then it can be shown that
-the solution to :eq:`normal` is given by
+residuals) with relatively dense Jacobians, QR-decomposition is the
+method of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition
+of :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R`
+is an upper triangular matrix [TrefethenBau]_. Then it can be shown
+that the solution to :eq:`normal` is given by
 
 .. math:: \Delta x^* = -R^{-1}Q^\top f
 
+You can use QR-decomposition by setting
+:member:`Solver::Options::linear_solver_type` to ``DENSE_QR``.
 
-Ceres uses ``Eigen`` 's dense QR factorization routines.
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen Householder QR factorization
+<https://eigen.tuxfamily.org/dox-devel/classEigen_1_1HouseholderQR.html>`_
+.
 
-.. _section-cholesky:
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DGEQRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Householder
+QR factorization.
 
-``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-------------------------------------------------------
+.. _DGEQRF: https://netlib.org/lapack/explore-html/df/dc5/group__variants_g_ecomputational_ga3766ea903391b5cf9008132f7440ec7b.html
 
-Large non-linear least square problems are usually sparse. In such
-cases, using a dense QR factorization is inefficient. Let :math:`H =
-R^\top R` be the Cholesky factorization of the normal equations, where
-:math:`R` is an upper triangular matrix, then the solution to
-:eq:`normal` is given by
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+QR-decomposition on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial
+speedup. Using CUDA only makes sense for moderate to large sized
+problems. This is because to perform the decomposition on the GPU the
+matrix :math:`J` needs to be transferred from the CPU to the GPU and
+this incurs a cost. So unless the speedup from doing the decomposition
+on the GPU is large enough to also account for the time taken to
+transfer the Jacobian to the GPU, using CUDA will not be better than
+just doing the decomposition on the CPU.
+
+.. _section-dense-normal-cholesky:
+
+DENSE_NORMAL_CHOLESKY
+---------------------
+
+It is often the case that the number of rows in the Jacobian :math:`J`
+are much larger than the the number of columns. The complexity of QR
+factorization scales linearly with the number of rows, so beyond a
+certain size it is more efficient to solve :eq:`normal` using a dense
+`Cholesky factorization
+<https://en.wikipedia.org/wiki/Cholesky_decomposition>`_.
+
+Let :math:`H = R^\top R` be the Cholesky factorization of the normal
+equations, where :math:`R` is an upper triangular matrix, then the
+solution to :eq:`normal` is given by
 
 .. math::
 
@@ -479,69 +538,155 @@
 factorization of :math:`H` is the same upper triangular matrix
 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
-Q^\top Q R = R^\top R`. There are two variants of Cholesky
-factorization -- sparse and dense.
+Q^\top Q R = R^\top R`.
 
-``DENSE_NORMAL_CHOLESKY``  as the name implies performs a dense
-Cholesky factorization of the normal equations. Ceres uses
-``Eigen`` 's dense LDLT factorization routines.
+Unfortunately, forming the matrix :math:`H = J'J` squares the
+condition number. As a result while the cost of forming :math:`H` and
+computing its Cholesky factorization is lower than computing the
+QR-factorization of :math:`J`, we pay the price in terms of increased
+numerical instability and potential failure of the Cholesky
+factorization for ill-conditioned Jacobians.
 
-``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
-Cholesky factorization of the normal equations. This leads to
-substantial savings in time and memory for large sparse
-problems. Ceres uses the sparse Cholesky factorization routines in
-Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
-or the sparse Cholesky factorization algorithm in ``Eigen`` (which
-incidently is a port of the algorithm implemented inside ``CXSparse``)
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``DENSE_NORMAL_CHOLESKY``.
+
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen's LLT factorization`_ routine.
+
+.. _Eigen's LLT Factorization:  https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html
+
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DPOTRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Cholesky
+factorization.
+
+.. _DPOTRF: https://www.netlib.org/lapack/explore-html/d1/d7a/group__double_p_ocomputational_ga2f55f604a6003d03b5cd4a0adcfb74d6.html
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+Cholesky factorization on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial speedup.
+Using CUDA only makes sense for moderate to large sized problems. This
+is because to perform the decomposition on the GPU the matrix
+:math:`H` needs to be transferred from the CPU to the GPU and this
+incurs a cost. So unless the speedup from doing the decomposition on
+the GPU is large enough to also account for the time taken to transfer
+the Jacobian to the GPU, using CUDA will not be better than just doing
+the decomposition on the CPU.
+
+
+.. _section-sparse-normal-cholesky:
+
+SPARSE_NORMAL_CHOLESKY
+----------------------
+
+Large non-linear least square problems are usually sparse. In such
+cases, using a dense QR or Cholesky factorization is inefficient. For
+such problems, Cholesky factorization routines which treat :math:`H`
+as a sparse matrix and computes a sparse factor :math:`R` are better
+suited [Davis]_. This can lead to substantial savings in memory and
+CPU time for large sparse problems.
+
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``SPARSE_NORMAL_CHOLESKY``.
+
+The use of this linear solver requires that Ceres is compiled with
+support for at least one of:
+
+ 1. `SuiteSparse <https://people.engr.tamu.edu/davis/suitesparse.html>`_ (``SUITE_SPARSE``).
+ 2. `Apple's Accelerate framework
+    <https://developer.apple.com/documentation/accelerate/sparse_solvers?language=objc>`_
+    (``ACCELERATE_SPARSE``).
+ 3. `Eigen's sparse linear solvers
+    <https://eigen.tuxfamily.org/dox/group__SparseCholesky__Module.html>`_
+    (``EIGEN_SPARSE``).
+
+SuiteSparse and Accelerate offer high performance sparse Cholesky
+factorization routines as they level-3 BLAS routines
+internally. Eigen's sparse Cholesky routines are *simplicial* and do
+not use dense linear algebra routines and as a result cannot compete
+with SuiteSparse and Accelerate, especially on large problems. As a
+result to get the best performance out of SuiteSparse it should be
+linked to high quality BLAS and LAPACK implementations e.g. `ATLAS
+<https://math-atlas.sourceforge.net/>`_, `OpenBLAS
+<https://www.openblas.net/>`_ or `Intel MKL
+<https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html>`_.
+
+A critical part of a sparse Cholesky factorization routine is the use
+a fill-reducing ordering. By default Ceres Solver uses the Approximate
+Minimum Degree (``AMD``) ordering, which usually performs well, but
+there are other options that may perform better depending on the
+actual sparsity structure of the Jacobian. See :ref:`section-ordering`
+for more details.
 
 .. _section-cgnr:
 
-``CGNR``
---------
+CGNR
+----
 
-For general sparse problems, if the problem is too large for
-``CHOLMOD`` or a sparse linear algebra library is not linked into
-Ceres, another option is the ``CGNR`` solver. This solver uses the
-Conjugate Gradients solver on the *normal equations*, but without
-forming the normal equations explicitly. It exploits the relation
+For general sparse problems, if the problem is too large for sparse
+Cholesky factorization or a sparse linear algebra library is not
+linked into Ceres, another option is the ``CGNR`` solver. This solver
+uses the `Conjugate Gradients
+<https://en.wikipedia.org/wiki/Conjugate_gradient_method>_` method on
+the *normal equations*, but without forming the normal equations
+explicitly. It exploits the relation
 
 .. math::
     H x = J^\top J x = J^\top(J x)
 
-The convergence of Conjugate Gradients depends on the conditioner
-number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
-a :ref:`section-preconditioner` must be used to get reasonable
-performance. Currently only the ``JACOBI`` preconditioner is available
-for use with ``CGNR``. It uses the block diagonal of :math:`H` to
-precondition the normal equations.
+Because ``CGNR`` never solves the linear system exactly, when the user
+chooses ``CGNR`` as the linear solver, Ceres automatically switches
+from the exact step algorithm to an inexact step algorithm. This also
+means that ``CGNR`` can only be used with ``LEVENBERG_MARQUARDT`` and
+not with ``DOGLEG`` trust region strategy.
 
-When the user chooses ``CGNR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+``CGNR`` by default runs on the CPU. However, if an NVIDIA GPU is
+available and Ceres Solver has been built with CUDA support enabled,
+then the user can also choose to run ``CGNR`` on the GPU by setting
+:member:`Solver::Options::sparse_linear_algebra_library_type` to
+``CUDA_SPARSE``. The key complexity of ``CGNR`` comes from evaluating
+the two sparse-matrix vector products (SpMV) :math:`Jx` and
+:math:`J'y`. GPUs are particularly well suited for doing sparse
+matrix-vector products. As a result, for large problems using a GPU
+can lead to a substantial speedup.
+
+The convergence of Conjugate Gradients depends on the conditioner
+number :math:`\kappa(H)`. Usually :math:`H` is quite poorly
+conditioned and a `Preconditioner
+<https://en.wikipedia.org/wiki/Preconditioner>`_ must be used to get
+reasonable performance. See section on :ref:`section-preconditioner`
+for more details.
 
 .. _section-schur:
 
-``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-----------------------------------
+DENSE_SCHUR & SPARSE_SCHUR
+--------------------------
 
 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
-adjustment problems, bundle adjustment problem have a special
-structure, and a more efficient scheme for solving :eq:`normal`
-can be constructed.
+adjustment problems, they have a special sparsity structure that can
+be exploited to solve the normal equations more efficiently.
 
-Suppose that the SfM problem consists of :math:`p` cameras and
-:math:`q` points and the variable vector :math:`x` has the block
-structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
-:math:`y` and :math:`z` correspond to camera and point parameters,
-respectively.  Further, let the camera blocks be of size :math:`c` and
-the point blocks be of size :math:`s` (for most problems :math:`c` =
-:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
-requirement on these block sizes, but choosing them to be constant
-simplifies the exposition.
+Suppose that the bundle adjustment problem consists of :math:`p`
+cameras and :math:`q` points and the variable vector :math:`x` has the
+block structure :math:`x = [y_{1}, ... ,y_{p},z_{1},
+... ,z_{q}]`. Where, :math:`y` and :math:`z` correspond to camera and
+point parameters respectively.  Further, let the camera blocks be of
+size :math:`c` and the point blocks be of size :math:`s` (for most
+problems :math:`c` = :math:`6`--`9` and :math:`s = 3`). Ceres does not
+impose any constancy requirement on these block sizes, but choosing
+them to be constant simplifies the exposition.
 
-A key characteristic of the bundle adjustment problem is that there is
-no term :math:`f_{i}` that includes two or more point blocks.  This in
-turn implies that the matrix :math:`H` is of the form
+The key property of bundle adjustment problems which we will exploit
+is the fact that no term :math:`f_{i}` in :eq:`nonlinsq` includes two
+or more point blocks at the same time.  This in turn implies that the
+matrix :math:`H` is of the form
 
 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
    :label: hblock
@@ -585,123 +730,208 @@
 observe at least one common point.
 
 
-Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
-:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
+Now :eq:`linear2` can be solved by first forming :math:`S`, solving
+for :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
 and matrix-vector multiplies, and the solution of block sparse
 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
 problems, the number of cameras is much smaller than the number of
-points, :math:`p \ll q`, thus solving :eq:`schur` is
-significantly cheaper than solving :eq:`linear2`. This is the
-*Schur complement trick* [Brown]_.
+points, :math:`p \ll q`, thus solving :eq:`schur` is significantly
+cheaper than solving :eq:`linear2`. This is the *Schur complement
+trick* [Brown]_.
 
-This still leaves open the question of solving :eq:`schur`. The
-method of choice for solving symmetric positive definite systems
-exactly is via the Cholesky factorization [TrefethenBau]_ and
-depending upon the structure of the matrix, there are, in general, two
-options. The first is direct factorization, where we store and factor
-:math:`S` as a dense matrix [TrefethenBau]_. This method has
+This still leaves open the question of solving :eq:`schur`. As we
+discussed when considering the exact solution of the normal equations
+using Cholesky factorization, we have two options.
+
+1. ``DENSE_SCHUR`` - The first is **dense Cholesky factorization**,
+where we store and factor :math:`S` as a dense matrix. This method has
 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
-is only practical for problems with up to a few hundred cameras. Ceres
-implements this strategy as the ``DENSE_SCHUR`` solver.
+is only practical for problems with up to a few hundred cameras.
 
-
-But, :math:`S` is typically a fairly sparse matrix, as most images
-only see a small fraction of the scene. This leads us to the second
-option: Sparse Direct Methods. These methods store :math:`S` as a
+2. ``SPARSE_SCHUR`` - For large bundle adjustment problems :math:`S`
+is typically a fairly sparse matrix, as most images only see a small
+fraction of the scene. This leads us to the second option: **sparse
+Cholesky factorization** [Davis]_.  Here we store :math:`S` as a
 sparse matrix, use row and column re-ordering algorithms to maximize
 the sparsity of the Cholesky decomposition, and focus their compute
-effort on the non-zero part of the factorization [Chen]_. Sparse
-direct methods, depending on the exact sparsity structure of the Schur
-complement, allow bundle adjustment algorithms to significantly scale
-up over those based on dense factorization. Ceres implements this
-strategy as the ``SPARSE_SCHUR`` solver.
+effort on the non-zero part of the factorization [Davis]_ [Chen]_
+. Sparse direct methods, depending on the exact sparsity structure of
+the Schur complement, allow bundle adjustment algorithms to scenes
+with thousands of cameras.
+
 
 .. _section-iterative_schur:
 
-``ITERATIVE_SCHUR``
--------------------
+ITERATIVE_SCHUR
+---------------
 
-Another option for bundle adjustment problems is to apply
-Preconditioned Conjugate Gradients to the reduced camera matrix
-:math:`S` instead of :math:`H`. One reason to do this is that
-:math:`S` is a much smaller matrix than :math:`H`, but more
-importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
+Another option for bundle adjustment problems is to apply Conjugate
+Gradients to the reduced camera matrix :math:`S` instead of
+:math:`H`. One reason to do this is that :math:`S` is a much smaller
+matrix than :math:`H`, but more importantly, it can be shown that
+:math:`\kappa(S)\leq \kappa(H)` [Agarwal]_.
+
 Ceres implements Conjugate Gradients on :math:`S` as the
 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
 as the linear solver, Ceres automatically switches from the exact step
 algorithm to an inexact step algorithm.
 
+
 The key computational operation when using Conjuagate Gradients is the
 evaluation of the matrix vector product :math:`Sx` for an arbitrary
-vector :math:`x`. There are two ways in which this product can be
-evaluated, and this can be controlled using
-``Solver::Options::use_explicit_schur_complement``. Depending on the
-problem at hand, the performance difference between these two methods
-can be quite substantial.
+vector :math:`x`. Because PCG only needs access to :math:`S` via its
+product with a vector, one way to evaluate :math:`Sx` is to observe
+that
 
-  1. **Implicit** This is default. Implicit evaluation is suitable for
-     large problems where the cost of computing and storing the Schur
-     Complement :math:`S` is prohibitive. Because PCG only needs
-     access to :math:`S` via its product with a vector, one way to
-     evaluate :math:`Sx` is to observe that
+.. math::  x_1 &= E^\top x\\
+           x_2 &= C^{-1} x_1\\
+           x_3 &= Ex_2\\
+           x_4 &= Bx\\
+           Sx &= x_4 - x_3
+   :label: schurtrick1
 
-     .. math::  x_1 &= E^\top x\\
-                x_2 &= C^{-1} x_1\\
-                x_3 &= Ex_2\\
-                x_4 &= Bx\\
-                Sx &= x_4 - x_3
-        :label: schurtrick1
+Thus, we can run Conjugate Gradients on :math:`S` with the same
+computational effort per iteration as Conjugate Gradients on
+:math:`H`, while reaping the benefits of a more powerful
+preconditioner. In fact, we do not even need to compute :math:`H`,
+:eq:`schurtrick1` can be implemented using just the columns of
+:math:`J`.
 
-     Thus, we can run PCG on :math:`S` with the same computational
-     effort per iteration as PCG on :math:`H`, while reaping the
-     benefits of a more powerful preconditioner. In fact, we do not
-     even need to compute :math:`H`, :eq:`schurtrick1` can be
-     implemented using just the columns of :math:`J`.
+Equation :eq:`schurtrick1` is closely related to *Domain Decomposition
+methods* for solving large linear systems that arise in structural
+engineering and partial differential equations. In the language of
+Domain Decomposition, each point in a bundle adjustment problem is a
+domain, and the cameras form the interface between these domains. The
+iterative solution of the Schur complement then falls within the
+sub-category of techniques known as Iterative Sub-structuring [Saad]_
+[Mathew]_.
 
-     Equation :eq:`schurtrick1` is closely related to *Domain
-     Decomposition methods* for solving large linear systems that
-     arise in structural engineering and partial differential
-     equations. In the language of Domain Decomposition, each point in
-     a bundle adjustment problem is a domain, and the cameras form the
-     interface between these domains. The iterative solution of the
-     Schur complement then falls within the sub-category of techniques
-     known as Iterative Sub-structuring [Saad]_ [Mathew]_.
+While in most cases the above method for evaluating :math:`Sx` is the
+way to go, for some problems it is better to compute the Schur
+complemenent :math:`S` explicitly and then run Conjugate Gradients on
+it. This can be done by settin
+``Solver::Options::use_explicit_schur_complement`` to ``true``. This
+option can only be used with the ``SCHUR_JACOBI`` preconditioner.
 
-  2. **Explicit** The complexity of implicit matrix-vector product
-     evaluation scales with the number of non-zeros in the
-     Jacobian. For small to medium sized problems, the cost of
-     constructing the Schur Complement is small enough that it is
-     better to construct it explicitly in memory and use it to
-     evaluate the product :math:`Sx`.
 
-When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+.. _section-schur_power_series_expansion:
 
-  .. NOTE::
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
 
-     In exact arithmetic, the choice of implicit versus explicit Schur
-     complement would have no impact on solution quality. However, in
-     practice if the Jacobian is poorly conditioned, one may observe
-     (usually small) differences in solution quality. This is a
-     natural consequence of performing computations in finite arithmetic.
+It can be shown that the inverse of the Schur complement can be
+written as an infinite power-series [Weber]_ [Zheng]_:
 
+.. math:: S      &= B - EC^{-1}E^\top\\
+	         &= B(I - B^{-1}EC^{-1}E^\top)\\
+	  S^{-1} &= (I - B^{-1}EC^{-1}E^\top)^{-1} B^{-1}\\
+	         & = \sum_{i=0}^\infty \left(B^{-1}EC^{-1}E^\top\right)^{i} B^{-1}
+
+As a result a truncated version of this power series expansion can be
+used to approximate the inverse and therefore the solution to
+:eq:`schur`. Ceres allows the user to use Schur power series expansion
+in three ways.
+
+1. As a linear solver. This is what [Weber]_ calls **Power Bundle
+   Adjustment** and corresponds to using the truncated power series to
+   approximate the inverse of the Schur complement. This is done by
+   setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = IDENTITY
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      Solver::Options::spse_tolerance = 0.1;
+
+
+2. As a preconditioner for ``ITERATIVE_SCHUR``. Any method for
+   approximating the inverse of a matrix can also be used as a
+   preconditioner. This is enabled by setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = SCHUR_POWER_SERIES_EXPANSION;
+      Solver::Options::use_spse_initialization = false;
+
+      // This is worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+
+
+3. As initialization for ``ITERATIIVE_SCHUR`` with any
+   preconditioner. This is a combination of the above two, where the
+   Schur Power Series Expansion
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = ... // Preconditioner of your choice.
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      // This only affects the initialization but not the preconditioner.
+      Solver::Options::spse_tolerance = 0.1;
+
+
+.. _section-mixed-precision:
+
+Mixed Precision Solves
+======================
+
+Generally speaking Ceres Solver does all its arithmetic in double
+precision. Sometimes though, one can use single precision arithmetic
+to get substantial speedups. Currently, for linear solvers that
+perform Cholesky factorization (sparse or dense) the user has the
+option cast the linear system to single precision and then use
+single precision Cholesky factorization routines to solve the
+resulting linear system. This can be enabled by setting
+:member:`Solver::Options::use_mixed_precision_solves` to ``true``.
+
+Depending on the conditioning of the problem, the use of single
+precision factorization may lead to some loss of accuracy. Some of
+this accuracy can be recovered by performing `Iterative Refinement
+<https://en.wikipedia.org/wiki/Iterative_refinement>`_. The number of
+iterations of iterative refinement are controlled by
+:member:`Solver::Options::max_num_refinement_iterations`. The default
+value of this parameter is zero, which means if
+:member:`Solver::Options::use_mixed_precision_solves` is ``true``,
+then no iterative refinement is performed. Usually 2-3 refinement
+iterations are enough.
+
+Mixed precision solves are available in the following linear solver
+configurations:
+
+1. ``DENSE_NORMAL_CHOLESKY`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+2. ``DENSE_SCHUR`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+3. ``SPARSE_NORMAL_CHOLESKY`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+4. ``SPARSE_SCHUR`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+
+Mixed precision solves area not available when using ``SUITE_SPARSE``
+as the sparse linear algebra backend because SuiteSparse/CHOLMOD does
+not support single precision solves.
 
 .. _section-preconditioner:
 
-Preconditioner
-==============
+Preconditioners
+===============
 
-The convergence rate of Conjugate Gradients for
-solving :eq:`normal` depends on the distribution of eigenvalues
-of :math:`H` [Saad]_. A useful upper bound is
-:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
-number of the matrix :math:`H`. For most bundle adjustment problems,
-:math:`\kappa(H)` is high and a direct application of Conjugate
-Gradients to :eq:`normal` results in extremely poor performance.
+The convergence rate of Conjugate Gradients for solving :eq:`normal`
+depends on the distribution of eigenvalues of :math:`H` [Saad]_. A
+useful upper bound is :math:`\sqrt{\kappa(H)}`, where,
+:math:`\kappa(H)` is the condition number of the matrix :math:`H`. For
+most non-linear least squares problems, :math:`\kappa(H)` is high and
+a direct application of Conjugate Gradients to :eq:`normal` results in
+extremely poor performance.
 
 The solution to this problem is to replace :eq:`normal` with a
 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
@@ -733,8 +963,15 @@
 problems with general sparsity as well as the special sparsity
 structure encountered in bundle adjustment problems.
 
-``JACOBI``
-----------
+IDENTITY
+--------
+
+This is equivalent to using an identity matrix as a preconditioner,
+i.e. no preconditioner at all.
+
+
+JACOBI
+------
 
 The simplest of all preconditioners is the diagonal or Jacobi
 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
@@ -742,24 +979,27 @@
 block Jacobi preconditioner. The ``JACOBI`` preconditioner in Ceres
 when used with :ref:`section-cgnr` refers to the block diagonal of
 :math:`H` and when used with :ref:`section-iterative_schur` refers to
-the block diagonal of :math:`B` [Mandel]_. For detailed performance
-data about the performance of ``JACOBI`` on bundle adjustment problems
-see [Agarwal]_.
+the block diagonal of :math:`B` [Mandel]_.
+
+For detailed performance data about the performance of ``JACOBI`` on
+bundle adjustment problems see [Agarwal]_.
 
 
-``SCHUR_JACOBI``
-----------------
+SCHUR_JACOBI
+------------
 
 Another obvious choice for :ref:`section-iterative_schur` is the block
 diagonal of the Schur complement matrix :math:`S`, i.e, the block
 Jacobi preconditioner for :math:`S`. In Ceres we refer to it as the
-``SCHUR_JACOBI`` preconditioner.  For detailed performance data about
-the performance of ``SCHUR_JACOBI`` on bundle adjustment problems see
-[Agarwal]_.
+``SCHUR_JACOBI`` preconditioner.
 
 
-``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``
-----------------------------------------------
+For detailed performance data about the performance of
+``SCHUR_JACOBI`` on bundle adjustment problems see [Agarwal]_.
+
+
+CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL
+--------------------------------------
 
 For bundle adjustment problems arising in reconstruction from
 community photo collections, more effective preconditioners can be
@@ -790,8 +1030,19 @@
 algorithm. The choice of clustering algorithm is controlled by
 :member:`Solver::Options::visibility_clustering_type`.
 
-``SUBSET``
-----------
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
+
+As explained in :ref:`section-schur_power_series_expansion`, the Schur
+complement matrix admits a power series expansion and a truncated
+version of this power series can be used as a preconditioner for
+``ITERATIVE_SCHUR``. When used as a preconditioner
+:member:`Solver::Options::max_num_spse_iterations` controls the number
+of terms in the power series that are used.
+
+
+SUBSET
+------
 
 This is a  preconditioner for problems with general  sparsity. Given a
 subset  of residual  blocks of  a problem,  it uses  the corresponding
@@ -811,6 +1062,8 @@
 :math:`Q` approximates :math:`J^\top J`, or how well the chosen
 residual blocks approximate the full problem.
 
+This preconditioner is NOT available when running ``CGNR`` using
+``CUDA``.
 
 .. _section-ordering:
 
@@ -821,33 +1074,36 @@
 have a significant of impact on the efficiency and accuracy of the
 method. For example when doing sparse Cholesky factorization, there
 are matrices for which a good ordering will give a Cholesky factor
-with :math:`O(n)` storage, where as a bad ordering will result in an
+with :math:`O(n)` storage, whereas a bad ordering will result in an
 completely dense factor.
 
 Ceres allows the user to provide varying amounts of hints to the
 solver about the variable elimination ordering to use. This can range
-from no hints, where the solver is free to decide the best ordering
-based on the user's choices like the linear solver being used, to an
-exact order in which the variables should be eliminated, and a variety
-of possibilities in between.
+from no hints, where the solver is free to decide the best possible
+ordering based on the user's choices like the linear solver being
+used, to an exact order in which the variables should be eliminated,
+and a variety of possibilities in between.
 
-Instances of the :class:`ParameterBlockOrdering` class are used to
-communicate this information to Ceres.
+The simplest thing to do is to just set
+:member:`Solver::Options::linear_solver_ordering_type` to ``AMD``
+(default) or ``NESDIS`` based on your understanding of the problem or
+empirical testing.
 
-Formally an ordering is an ordered partitioning of the parameter
-blocks. Each parameter block belongs to exactly one group, and each
-group has a unique integer associated with it, that determines its
-order in the set of groups. We call these groups *Elimination Groups*
 
-Given such an ordering, Ceres ensures that the parameter blocks in the
-lowest numbered elimination group are eliminated first, and then the
-parameter blocks in the next lowest numbered elimination group and so
-on. Within each elimination group, Ceres is free to order the
-parameter blocks as it chooses. For example, consider the linear system
+More information can be commmuniucated by using an instance
+:class:`ParameterBlockOrdering` class.
+
+Formally an ordering is an ordered partitioning of the
+parameter blocks, i.e, each parameter block belongs to exactly
+one group, and each group has a unique non-negative integer
+associated with it, that determines its order in the set of
+groups.
+
+e.g. Consider the linear system
 
 .. math::
-  x + y &= 3\\
-  2x + 3y &= 7
+   x + y &= 3 \\
+   2x + 3y &= 7
 
 There are two ways in which it can be solved. First eliminating
 :math:`x` from the two equations, solving for :math:`y` and then back
@@ -855,33 +1111,92 @@
 for :math:`x` and back substituting for :math:`y`. The user can
 construct three orderings here.
 
-1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
-2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
-3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
+1. :math:`\{0: x\}, \{1: y\}` - eliminate :math:`x` first.
+2. :math:`\{0: y\}, \{1: x\}` - eliminate :math:`y` first.
+3. :math:`\{0: x, y\}` - Solver gets to decide the elimination order.
 
-Thus, to have Ceres determine the ordering automatically using
-heuristics, put all the variables in the same elimination group. The
-identity of the group does not matter. This is the same as not
-specifying an ordering at all. To control the ordering for every
-variable, create an elimination group per variable, ordering them in
-the desired order.
+Thus, to have Ceres determine the ordering automatically, put all the
+variables in group 0 and to control the ordering for every variable,
+create groups :math:`0 \dots N-1`, one per variable, in the desired
+order.
+
+``linear_solver_ordering == nullptr`` and an ordering where all the
+parameter blocks are in one elimination group mean the same thing -
+the solver is free to choose what it thinks is the best elimination
+ordering using the ordering algorithm (specified using
+:member:`Solver::Options::linear_solver_ordering_type`). Therefore in
+the following we will only consider the case where
+``linear_solver_ordering != nullptr``.
+
+The exact interpretation of the ``linear_solver_ordering`` depends on
+the values of :member:`Solver::Options::linear_solver_ordering_type`,
+:member:`Solver::Options::linear_solver_type`,
+:member:`Solver::Options::preconditioner_type` and
+:member:`Solver::Options::sparse_linear_algebra_library_type` as we will
+explain below.
+
+Bundle Adjustment
+-----------------
 
 If the user is using one of the Schur solvers (``DENSE_SCHUR``,
 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
 ordering, it must have one important property. The lowest numbered
 elimination group must form an independent set in the graph
 corresponding to the Hessian, or in other words, no two parameter
-blocks in in the first elimination group should co-occur in the same
+blocks in the first elimination group should co-occur in the same
 residual block. For the best performance, this elimination group
 should be as large as possible. For standard bundle adjustment
 problems, this corresponds to the first elimination group containing
-all the 3d points, and the second containing the all the cameras
-parameter blocks.
+all the 3d points, and the second containing the parameter blocks for
+all the cameras.
 
 If the user leaves the choice to Ceres, then the solver uses an
 approximate maximum independent set algorithm to identify the first
 elimination group [LiSaad]_.
 
+``sparse_linear_algebra_library_type = SUITE_SPARSE``
+-----------------------------------------------------
+
+**linear_solver_ordering_type = AMD**
+
+A constrained Approximate Minimum Degree (CAMD) ordering is used where
+the parameter blocks in the lowest numbered group are eliminated
+first, and then the parameter blocks in the next lowest numbered group
+and so on. Within each group, CAMD is free to order the parameter blocks
+as it chooses.
+
+**linear_solver_ordering_type = NESDIS**
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and a Nested
+   Dissection algorithm is used to compute a fill reducing ordering.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   Nested Dissection is used to compute a fill reducing ordering for
+   the Schur Complement (or its preconditioner).
+
+``sparse_linear_algebra_library_type = EIGEN_SPARSE/ACCELERATE_SPARSE``
+-----------------------------------------------------------------------
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and ``AMD`` or
+   ``NESDIS`` is used to compute a fill reducing ordering as requested
+   by the user.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   ``AMD`` or ``NESID`` is used to compute a fill reducing ordering
+   for the Schur Complement (or its preconditioner) as requested by
+   the user.
+
+
 .. _section-solver-options:
 
 :class:`Solver::Options`
@@ -892,7 +1207,7 @@
    :class:`Solver::Options` controls the overall behavior of the
    solver. We list the various settings and their default values below.
 
-.. function:: bool Solver::Options::IsValid(string* error) const
+.. function:: bool Solver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -913,14 +1228,18 @@
    Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
    ``BFGS`` and ``LBFGS``.
 
+   See :ref:`section-line-search-methods` for more details.
+
 .. member:: LineSearchType Solver::Options::line_search_type
 
    Default: ``WOLFE``
 
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
-   ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   ``LBFGS`` line search direction algorithms to be satisfied, the
+   ``WOLFE`` line search must be used.
+
+   See :ref:`section-line-search-methods` for more details.
 
 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
 
@@ -931,26 +1250,28 @@
 
 .. member:: int Solver::Options::max_lbfgs_rank
 
-   Default: 20
+   Default: ``20``
 
-   The L-BFGS hessian approximation is a low rank approximation to the
-   inverse of the Hessian matrix. The rank of the approximation
-   determines (linearly) the space and time complexity of using the
-   approximation. Higher the rank, the better is the quality of the
-   approximation. The increase in quality is however is bounded for a
-   number of reasons.
+   The LBFGS hessian approximation is a low rank approximation to
+   the inverse of the Hessian matrix. The rank of the
+   approximation determines (linearly) the space and time
+   complexity of using the approximation. Higher the rank, the
+   better is the quality of the approximation. The increase in
+   quality is however is bounded for a number of reasons.
 
-     1. The method only uses secant information and not actual
-        derivatives.
+   1. The method only uses secant information and not actual
+   derivatives.
+   2. The Hessian approximation is constrained to be positive
+   definite.
 
-     2. The Hessian approximation is constrained to be positive
-        definite.
+   So increasing this rank to a large number will cost time and
+   space complexity without the corresponding increase in solution
+   quality. There are no hard and fast rules for choosing the
+   maximum rank. The best choice usually requires some problem
+   specific experimentation.
 
-   So increasing this rank to a large number will cost time and space
-   complexity without the corresponding increase in solution
-   quality. There are no hard and fast rules for choosing the maximum
-   rank. The best choice usually requires some problem specific
-   experimentation.
+   For more theoretical and implementation details of the LBFGS
+   method, please see [Nocedal]_.
 
 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
 
@@ -999,6 +1320,8 @@
 
 .. member:: double Solver::Options::min_line_search_step_size
 
+   Default: ``1e-9``
+
    The line search terminates if:
 
    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
@@ -1153,7 +1476,8 @@
 
 .. member:: double Solver::Options::max_solver_time_in_seconds
 
-   Default: ``1e6``
+   Default: ``1e9``
+
    Maximum amount of time for which the solver should run.
 
 .. member:: int Solver::Options::num_threads
@@ -1239,8 +1563,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the overall manifold associated with the
+   parameter vector.
 
 .. member:: double Solver::Options::parameter_tolerance
 
@@ -1260,7 +1584,7 @@
    Type of linear solver used to compute the solution to the linear
    least squares problem in each iteration of the Levenberg-Marquardt
    algorithm. If Ceres is built with support for ``SuiteSparse`` or
-   ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
+   ``Accelerate`` or ``Eigen``'s sparse Cholesky factorization, the
    default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
    otherwise.
 
@@ -1271,8 +1595,9 @@
    The preconditioner used by the iterative linear solver. The default
    is the block Jacobi preconditioner. Valid values are (in increasing
    order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
-   ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
-   :ref:`section-preconditioner` for more details.
+   ``CLUSTER_JACOBI``, ``CLUSTER_TRIDIAGONAL``, ``SUBSET`` and
+   ``SCHUR_POWER_SERIES_EXPANSION``. See :ref:`section-preconditioner`
+   for more details.
 
 .. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
 
@@ -1296,7 +1621,7 @@
    recommend that you try ``CANONICAL_VIEWS`` first and if it is too
    expensive try ``SINGLE_LINKAGE``.
 
-.. member:: std::unordered_set<ResidualBlockId> residual_blocks_for_subset_preconditioner
+.. member:: std::unordered_set<ResidualBlockId> Solver::Options::residual_blocks_for_subset_preconditioner
 
    ``SUBSET`` preconditioner is a preconditioner for problems with
    general sparsity. Given a subset of residual blocks of a problem,
@@ -1321,65 +1646,76 @@
 
 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
 
-   Default:``EIGEN``
+   Default: ``EIGEN``
 
    Ceres supports using multiple dense linear algebra libraries for
-   dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
-   the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
-   to the system ``BLAS + LAPACK`` library which may or may not be
-   available.
+   dense matrix factorizations. Currently ``EIGEN``, ``LAPACK`` and
+   ``CUDA`` are the valid choices. ``EIGEN`` is always available,
+   ``LAPACK`` refers to the system ``BLAS + LAPACK`` library which may
+   or may not be available. ``CUDA`` refers to Nvidia's GPU based
+   dense linear algebra library which may or may not be available.
 
    This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
-   and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
+   and ``DENSE_SCHUR`` solvers. For small to moderate sized problem
    ``EIGEN`` is a fine choice but for large problems, an optimized
-   ``LAPACK + BLAS`` implementation can make a substantial difference
-   in performance.
+   ``LAPACK + BLAS`` or ``CUDA`` implementation can make a substantial
+   difference in performance.
 
 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
 
    Default: The highest available according to: ``SUITE_SPARSE`` >
-   ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
+   ``ACCELERATE_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
 
    Ceres supports the use of three sparse linear algebra libraries,
    ``SuiteSparse``, which is enabled by setting this parameter to
-   ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
-   this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
-   setting this parameter to ``EIGEN_SPARSE``.  Lastly, ``NO_SPARSE``
-   means that no sparse linear solver should be used; note that this is
-   irrespective of whether Ceres was compiled with support for one.
+   ``SUITE_SPARSE``, ``Acclerate``, which can be selected by setting
+   this parameter to ``ACCELERATE_SPARSE`` and ``Eigen`` which is
+   enabled by setting this parameter to ``EIGEN_SPARSE``.  Lastly,
+   ``NO_SPARSE`` means that no sparse linear solver should be used;
+   note that this is irrespective of whether Ceres was compiled with
+   support for one.
 
-   ``SuiteSparse`` is a sophisticated and complex sparse linear
-   algebra library and should be used in general.
+   ``SuiteSparse`` is a sophisticated sparse linear algebra library
+   and should be used in general. On MacOS you may want to use the
+   ``Accelerate`` framework.
 
    If your needs/platforms prevent you from using ``SuiteSparse``,
-   consider using ``CXSparse``, which is a much smaller, easier to
-   build library. As can be expected, its performance on large
-   problems is not comparable to that of ``SuiteSparse``.
+   consider using the sparse linear algebra routines in ``Eigen``. The
+   sparse Cholesky algorithms currently included with ``Eigen`` are
+   not as sophisticated as the ones in ``SuiteSparse`` and
+   ``Accelerate`` and as a result its performance is considerably
+   worse.
 
-   Last but not the least you can use the sparse linear algebra
-   routines in ``Eigen``. Currently the performance of this library is
-   the poorest of the three. But this should change in the near
-   future.
+.. member:: LinearSolverOrderingType Solver::Options::linear_solver_ordering_type
 
-   Another thing to consider here is that the sparse Cholesky
-   factorization libraries in Eigen are licensed under ``LGPL`` and
-   building Ceres with support for ``EIGEN_SPARSE`` will result in an
-   LGPL licensed library (since the corresponding code from Eigen is
-   compiled into the library).
+   Default: ``AMD``
 
-   The upside is that you do not need to build and link to an external
-   library to use ``EIGEN_SPARSE``.
+    The order in which variables are eliminated in a linear solver can
+    have a significant impact on the efficiency and accuracy of the
+    method. e.g., when doing sparse Cholesky factorization, there are
+    matrices for which a good ordering will give a Cholesky factor
+    with :math:`O(n)` storage, where as a bad ordering will result in
+    an completely dense factor.
 
+    Sparse direct solvers like ``SPARSE_NORMAL_CHOLESKY`` and
+    ``SPARSE_SCHUR`` use a fill reducing ordering of the columns and
+    rows of the matrix being factorized before computing the numeric
+    factorization.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+    This enum controls the type of algorithm used to compute this fill
+    reducing ordering. There is no single algorithm that works on all
+    matrices, so determining which algorithm works better is a matter
+    of empirical experimentation.
 
-   Default: ``NULL``
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+
+   Default: ``nullptr``
 
    An instance of the ordering object informs the solver about the
    desired order in which parameter blocks should be eliminated by the
-   linear solvers. See section~\ref{sec:ordering`` for more details.
+   linear solvers.
 
-   If ``NULL``, the solver is free to choose an ordering that it
+   If ``nullptr``, the solver is free to choose an ordering that it
    thinks is best.
 
    See :ref:`section-ordering` for more details.
@@ -1404,41 +1740,15 @@
    efficient to explicitly compute it and use it for evaluating the
    matrix-vector products.
 
-   Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
-   computed Schur complement. This can improve the performance of the
-   ``ITERATIVE_SCHUR`` solver significantly.
-
-   .. NOTE:
+   .. NOTE::
 
      This option can only be used with the ``SCHUR_JACOBI``
      preconditioner.
 
-.. member:: bool Solver::Options::use_post_ordering
+.. member:: bool Solver::Options::dynamic_sparsity
 
    Default: ``false``
 
-   Sparse Cholesky factorization algorithms use a fill-reducing
-   ordering to permute the columns of the Jacobian matrix. There are
-   two ways of doing this.
-
-   1. Compute the Jacobian matrix in some order and then have the
-      factorization algorithm permute the columns of the Jacobian.
-
-   2. Compute the Jacobian with its columns already permuted.
-
-   The first option incurs a significant memory penalty. The
-   factorization algorithm has to make a copy of the permuted Jacobian
-   matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
-   and generally speaking, there is no performance penalty for doing
-   so.
-
-   In some rare cases, it is worth using a more complicated reordering
-   algorithm which has slightly better runtime performance at the
-   expense of an extra copy of the Jacobian matrix. Setting
-   ``use_postordering`` to ``true`` enables this tradeoff.
-
-.. member:: bool Solver::Options::dynamic_sparsity
-
    Some non-linear least squares problems are symbolically dense but
    numerically sparse. i.e. at any given state only a small number of
    Jacobian entries are non-zero, but the position and number of
@@ -1453,6 +1763,29 @@
 
    This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
 
+.. member:: bool Solver::Options::use_mixed_precision_solves
+
+   Default: ``false``
+
+   If true, the Gauss-Newton matrix is computed in *double* precision, but
+   its factorization is computed in **single** precision. This can result in
+   significant time and memory savings at the cost of some accuracy in the
+   Gauss-Newton step. Iterative refinement is used to recover some
+   of this accuracy back.
+
+   If ``use_mixed_precision_solves`` is true, we recommend setting
+   ``max_num_refinement_iterations`` to 2-3.
+
+   See :ref:`section-mixed-precision` for more details.
+
+.. member:: int Solver::Options::max_num_refinement_iterations
+
+   Default: ``0``
+
+   Number steps of the iterative refinement process to run when
+   computing the Gauss-Newton step, see
+   :member:`Solver::Options::use_mixed_precision_solves`.
+
 .. member:: int Solver::Options::min_linear_solver_iterations
 
    Default: ``0``
@@ -1469,6 +1802,34 @@
    makes sense when the linear solver is an iterative solver, e.g.,
    ``ITERATIVE_SCHUR`` or ``CGNR``.
 
+.. member:: int Solver::Options::max_num_spse_iterations
+
+   Default: `5`
+
+   Maximum number of iterations performed by
+   ``SCHUR_POWER_SERIES_EXPANSION``. Each iteration corresponds to one
+   more term in the power series expansion od the inverse of the Schur
+   complement.  This value controls the maximum number of iterations
+   whether it is used as a preconditioner or just to initialize the
+   solution for ``ITERATIVE_SCHUR``.
+
+.. member:: bool Solver:Options::use_spse_initialization
+
+   Default: ``false``
+
+   Use Schur power series expansion to initialize the solution for
+   ``ITERATIVE_SCHUR``. This option can be set ``true`` regardless of
+   what preconditioner is being used.
+
+.. member:: double Solver::Options::spse_tolerance
+
+   Default: `0.1`
+
+   When ``use_spse_initialization`` is ``true``, this parameter along
+   with ``max_num_spse_iterations`` controls the number of
+   ``SCHUR_POWER_SERIES_EXPANSION`` iterations performed for
+   initialization. It is not used to control the preconditioner.
+
 .. member:: double Solver::Options::eta
 
    Default: ``1e-1``
@@ -1502,6 +1863,25 @@
    objects that have an :class:`EvaluationCallback` associated with
    them.
 
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
+
+   Default: ``nullptr``
+
+   If :member:`Solver::Options::use_inner_iterations` true, then the
+   user has two choices.
+
+   1. Let the solver heuristically decide which parameter blocks to
+      optimize in each inner iteration. To do this, set
+      :member:`Solver::Options::inner_iteration_ordering` to ``nullptr``.
+
+   2. Specify a collection of of ordered independent sets. The lower
+      numbered groups are optimized before the higher number groups
+      during the inner optimization phase. Each group must be an
+      independent set. Not all parameter blocks need to be included in
+      the ordering.
+
+   See :ref:`section-ordering` for more details.
+
 .. member:: double Solver::Options::inner_iteration_tolerance
 
    Default: ``1e-3``
@@ -1516,37 +1896,21 @@
    inner iterations in subsequent trust region minimizer iterations is
    disabled.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
-
-   Default: ``NULL``
-
-   If :member:`Solver::Options::use_inner_iterations` true, then the
-   user has two choices.
-
-   1. Let the solver heuristically decide which parameter blocks to
-      optimize in each inner iteration. To do this, set
-      :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
-
-   2. Specify a collection of of ordered independent sets. The lower
-      numbered groups are optimized before the higher number groups
-      during the inner optimization phase. Each group must be an
-      independent set. Not all parameter blocks need to be included in
-      the ordering.
-
-   See :ref:`section-ordering` for more details.
 
 .. member:: LoggingType Solver::Options::logging_type
 
    Default: ``PER_MINIMIZER_ITERATION``
 
+   Valid values are ``SILENT`` and ``PER_MINIMIZER_ITERATION``.
+
 .. member:: bool Solver::Options::minimizer_progress_to_stdout
 
    Default: ``false``
 
-   By default the :class:`Minimizer` progress is logged to ``STDERR``
+   By default the Minimizer's progress is logged to ``STDERR``
    depending on the ``vlog`` level. If this flag is set to true, and
-   :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
-   output is sent to ``STDOUT``.
+   :member:`Solver::Options::logging_type` is not ``SILENT``, the
+   logging output is sent to ``STDOUT``.
 
    For ``TRUST_REGION_MINIMIZER`` the progress display looks like
 
@@ -1595,7 +1959,7 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
+.. member:: std::vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
 
    Default: ``empty``
 
@@ -1603,7 +1967,7 @@
    the trust region problem. Useful for testing and benchmarking. If
    ``empty``, no problems are dumped.
 
-.. member:: string Solver::Options::trust_region_problem_dump_directory
+.. member:: std::string Solver::Options::trust_region_problem_dump_directory
 
    Default: ``/tmp``
 
@@ -1614,7 +1978,7 @@
     :member:`Solver::Options::trust_region_problem_dump_format_type` is not
     ``CONSOLE``.
 
-.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
+.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format_type
 
    Default: ``TEXTFILE``
 
@@ -1708,23 +2072,32 @@
    should not expect to look at the parameter blocks and interpret
    their values.
 
-.. member:: vector<IterationCallback> Solver::Options::callbacks
+.. member:: std::vector<IterationCallback*> Solver::Options::callbacks
+
+   Default: ``empty``
 
    Callbacks that are executed at the end of each iteration of the
-   :class:`Minimizer`. They are executed in the order that they are
+   minimizer. They are executed in the order that they are
    specified in this vector.
 
    By default, parameter blocks are updated only at the end of the
-   optimization, i.e., when the :class:`Minimizer` terminates. This
-   means that by default, if an :class:`IterationCallback` inspects
-   the parameter blocks, they will not see them changing in the course
-   of the optimization.
+   optimization, i.e., when the minimizer terminates. This means that
+   by default, if an :class:`IterationCallback` inspects the parameter
+   blocks, they will not see them changing in the course of the
+   optimization.
 
    To tell Ceres to update the parameter blocks at the end of each
    iteration and before calling the user's callback, set
    :member:`Solver::Options::update_state_every_iteration` to
    ``true``.
 
+   See `examples/iteration_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+   for an example of an :class:`IterationCallback` that uses
+   :member:`Solver::Options::update_state_every_iteration` to log
+   changes to the parameter blocks over the course of the
+   optimization.
+
    The solver does NOT take ownership of these pointers.
 
 :class:`ParameterBlockOrdering`
@@ -1786,8 +2159,8 @@
 
    Number of groups with one or more elements.
 
-:class:`IterationCallback`
-==========================
+:class:`IterationSummary`
+=========================
 
 .. class:: IterationSummary
 
@@ -1863,7 +2236,7 @@
 
    Size of the trust region at the end of the current iteration. For
    the Levenberg-Marquardt algorithm, the regularization parameter is
-   1.0 / member::`IterationSummary::trust_region_radius`.
+   1.0 / :member:`IterationSummary::trust_region_radius`.
 
 .. member:: double IterationSummary::eta
 
@@ -1906,6 +2279,8 @@
 
    Time (in seconds) since the user called Solve().
 
+:class:`IterationCallback`
+==========================
 
 .. class:: IterationCallback
 
@@ -1939,6 +2314,10 @@
   #. ``SOLVER_CONTINUE`` indicates that the solver should continue
      optimizing.
 
+  The return values can be used to implement custom termination
+  criterion that supercede the iteration/time/tolerance based
+  termination implemented by Ceres.
+
   For example, the following :class:`IterationCallback` is used
   internally by Ceres to log the progress of the optimization.
 
@@ -1978,6 +2357,12 @@
     };
 
 
+  See `examples/evaluation_callback_example.cc
+  <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+  for another example that uses
+  :member:`Solver::Options::update_state_every_iteration` to log
+  changes to the parameter blocks over the course of the optimization.
+
 
 :class:`CRSMatrix`
 ==================
@@ -1995,13 +2380,13 @@
 
    Number of columns.
 
-.. member:: vector<int> CRSMatrix::rows
+.. member:: std::vector<int> CRSMatrix::rows
 
    :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
    sized array that points into the :member:`CRSMatrix::cols` and
    :member:`CRSMatrix::values` array.
 
-.. member:: vector<int> CRSMatrix::cols
+.. member:: std::vector<int> CRSMatrix::cols
 
    :member:`CRSMatrix::cols` contain as many entries as there are
    non-zeros in the matrix.
@@ -2009,7 +2394,7 @@
    For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
    are the indices of the non-zero columns of row ``i``.
 
-.. member:: vector<int> CRSMatrix::values
+.. member:: std::vector<double> CRSMatrix::values
 
    :member:`CRSMatrix::values` contain as many entries as there are
    non-zeros in the matrix.
@@ -2043,12 +2428,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string Solver::Summary::BriefReport() const
+.. function:: std::string Solver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string Solver::Summary::FullReport() const
+.. function:: std::string Solver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -2071,7 +2456,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string Solver::Summary::message
+.. member:: std::string Solver::Summary::message
 
    Reason why the solver terminated.
 
@@ -2091,14 +2476,14 @@
    were held fixed by the preprocessor because all the parameter
    blocks that they depend on were fixed.
 
-.. member:: vector<IterationSummary> Solver::Summary::iterations
+.. member:: std::vector<IterationSummary> Solver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
 .. member:: int Solver::Summary::num_successful_steps
 
    Number of minimizer iterations in which the step was
-   accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
+   accepted. Unless :member:`Solver::Options::use_nonmonotonic_steps`
    is `true` this is also the number of steps in which the objective
    function value/cost went down.
 
@@ -2112,13 +2497,13 @@
 
    Number of times inner iterations were performed.
 
- .. member:: int Solver::Summary::num_line_search_steps
+.. member:: int Solver::Summary::num_line_search_steps
 
-    Total number of iterations inside the line search algorithm across
-    all invocations. We call these iterations "steps" to distinguish
-    them from the outer iterations of the line search and trust region
-    minimizer algorithms which call the line search algorithm as a
-    subroutine.
+   Total number of iterations inside the line search algorithm across
+   all invocations. We call these iterations "steps" to distinguish
+   them from the outer iterations of the line search and trust region
+   minimizer algorithms which call the line search algorithm as a
+   subroutine.
 
 .. member:: double Solver::Summary::preprocessor_time_in_seconds
 
@@ -2126,7 +2511,7 @@
 
 .. member:: double Solver::Summary::minimizer_time_in_seconds
 
-   Time (in seconds) spent in the Minimizer.
+   Time (in seconds) spent in the minimizer.
 
 .. member:: double Solver::Summary::postprocessor_time_in_seconds
 
@@ -2180,7 +2565,7 @@
    Dimension of the tangent space of the problem (or the number of
    columns in the Jacobian for the problem). This is different from
    :member:`Solver::Summary::num_parameters` if a parameter block is
-   associated with a :class:`LocalParameterization`.
+   associated with a :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks
 
@@ -2206,7 +2591,7 @@
    number of columns in the Jacobian for the reduced problem). This is
    different from :member:`Solver::Summary::num_parameters_reduced` if
    a parameter block in the reduced problem is associated with a
-   :class:`LocalParameterization`.
+   :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks_reduced
 
@@ -2224,9 +2609,7 @@
 .. member:: int Solver::Summary::num_threads_used
 
    Number of threads actually used by the solver for Jacobian and
-   residual evaluation. This number is not equal to
-   :member:`Solver::Summary::num_threads_given` if none of `OpenMP`
-   or `CXX_THREADS` is available.
+   residual evaluation.
 
 .. member:: LinearSolverType Solver::Summary::linear_solver_type_given
 
@@ -2242,12 +2625,12 @@
    `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
    available.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_given
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_given
 
    Size of the elimination groups given by the user as hints to the
    linear solver.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_used
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_used
 
    Size of the parameter groups used by the solver when ordering the
    columns of the Jacobian.  This maybe different from
@@ -2285,12 +2668,12 @@
    actually performed. For example, in a problem with just one parameter
    block, inner iterations are not performed.
 
-.. member:: vector<int> inner_iteration_ordering_given
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_given
 
    Size of the parameter groups given by the user for performing inner
    iterations.
 
-.. member:: vector<int> inner_iteration_ordering_used
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_used
 
    Size of the parameter groups given used by the solver for
    performing inner iterations. This maybe different from
@@ -2315,7 +2698,7 @@
 
    Type of clustering algorithm used for visibility based
    preconditioning. Only meaningful when the
-   :member:`Solver::Summary::preconditioner_type` is
+   :member:`Solver::Summary::preconditioner_type_used` is
    ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
 
 .. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type
commit	3de38b02fbcca083f00ae6ced0ab9e7716b2eefb	[log] [tgz]
author	Austin Schuh <austin.linux@gmail.com>	Tue Jun 25 18:25:10 2024 -0700
committer	Austin Schuh <austin.linux@gmail.com>	Tue Jun 25 18:25:10 2024 -0700
tree	13aaecee97392759055000c7eca3617ea4d8ef7f
parent	1d1e6ea29cee3c5d86e10d0fe58c78b155174640 [diff] [blame]