Squashed 'third_party/ceres/' changes from 399cda773..7f9cc571b

7f9cc571b Set CMAKE_CUDA_ARCHITECTURES depending on CUDAToolkit_VERSION
6a74af202 Remove a level of indirection when using CellInfo
f8c2994da Drop Ubuntu 20.04 and add Ubuntu 24.04 support
20954e693 Eliminate CUDA set but unused variable warning
57aba3ed0 Enable Apple linker library deduplication
1f15197be Drop macos-11 runner and support macos-14 instead
5de0fda0f Update Github actions
308a5bb43 Add missing include
715865101 Remove 32-bit MinGW from CI matrix
f71181a92 Remove remaining references to CXSparse
522210a08 Reuse macro to format version string
cd2dd06e9 Fix clang16 compiler warnings
1f2e6313a Add missing std qualifiers
125c06882 Link static cuda libs when ceres is build static
62c03d6ff Revert "Update Eigen::BDCSVD usage to comply with Eigen 3.4"
4027f6997 Update Eigen::BDCSVD usage to comply with Eigen 3.4
1b2ebabf5 Typo fixes in the documentation
2ffeb943a Fix typo in AutoDiffManifold comment and docs
da34da3da Remove CreateFakeBundleAdjustmentPartitionedJacobian
85b2c418a ClangTidy fixes
91773746b Simplify instantiation of cost functions and their functors
8b88a9ab4 Use C++17 Bessel functions
84436f791 Use native CMake TBB package configuration
556a56f21 Do not assume Homebrew usage
3fd2a72cc MinGW no longer provides a 32-bit METIS package
e776a6f1a Unbreak the Bazel build.
095d48392 Skip structure detection on preprocessor failure
a0876309a Fix typos in comments
774973731 Fix search on ceres-solver.org
85331393d Update docs for 2.2.0.
2120eae67 Optimize the computation of the LM diagonal in TinySolver
611b139b1 Fix Solver::Options::callbacks type in documentation
b652d3b4f Unbreak the windows build
b79c4d350 ClangTidy fixes
76af132d0 Update docs for 2.2.0RC3
dc7a85975 Single-threaded operations on small vectors
b379ab768 Remove MaxNumThreadsAvailable
354002f98 Schedule task in ParallerFor from the previous task
a9b3fcff4 Minor update to docs
5ccab18be Drop use of POSIX M_PI_2 and M_PI_4
4519b8d77 Drop use of POSIX M_PI
b83abdcb1 Add a default value for Solver::Summary::linear_solver_ordering_type
8d875a312 Fix checks for CUDA memory pools support
94335e3b9 More ClangTidy fixes
399395c4f Miscellaneous ClangTidy fixes
c8bed4b93 Update version_history for 2.2.0rc2
489339219 Rework MSVC warning suppression
0cea191d4 Move stream-ordered memory allocations
8e3b7d89e Fix a copy-pasta error
dc0bb8508 Various cleanups to the documentation
4588b0fbb Add an example for EvaluationCallback
0fc3fd4fc Add documentation for examples
e6b2f532b Parallelize PSE preconditioner
41672dff8 Add end-to-end BA tests for SCHUR_POWER_SERIES_EXPANSION
83ee376d8 Add an example for IterationCallback
371265094 Cleanup example code
59182a42c Update documentation
d4db6e6fe Fix typos in the documentation for EvaluationCallback
dffd8cd71 Add an accessor for the CostFunctor in DynamicAutoDiffCostFunction
bea247701 Add a missing include dir to the cuda kernels target.
18ea7d1c2 Runtime check for cudaMallocAsync support
a227045be Remove cuda-memcheck based tests
d10e786ca Remove an unused variable from CudaSparseMatrix
5a30cae58 Preparing for 2.2.0rc1
9cca67127 Enable compatibility with SuiteSparse 7.2.0
a1c02e8d3 Rework the Sphinx find module
a57e35bba Require at least CMake 3.16
863db948f Eliminate macOS sprintf warning
6f5342db6 Export ceres_cuda_kernels to project's build tree
d864d146f Add macOS 13 runner to Github workflows
01a23504d Add a workaround for CMake Policy CMP0148
de9cbde95 Work around MinGW32 manifold_test segfault
5e4b22f7f Update CudaSparseMatrix class
ed9921fc2 Fix Solver::Options in documentation
de62bf220 Two minor fixes
5f97455be Fix typos in documentation
5ba62abec Add CUDA support to windows CI builds
a98fdf582 Update CMakeLists.txt to fix Windows CUDA build
ec4907399 Fix block-sparse to crs conversion on windows
799ee91bb Fix check in CompressedRowJacobianWriter::CreateJacobian()
ee90f6cbf Detect large Jacobians and return failure instead of crashing.
310a252fb Deal with infinite initial cost correctly.
357482db7 Add NumTraits::max_digits10 for Jets
908b5b1b5 Fix type mismatch in documentation
75bacedf7 CUDA partitioned matrix view
fd6197ce0 Fixed GCC 13.1 compilation errors
df97a8f05 Improve support of older CUDA toolkit versions
085214ea7 Fix test CompressedRowSparseMatrix.Transpose
d880df09f Match new[] with delete[] in BSM
bdee4d617 Block-sparse to CRS conversion using block-structure
0f9de3daf Use page locked memory in BlockSparseMatrix
e7bd72d41 Permutation-based conversion from block-sparse to crs
abbc4e797 Explicitly compute number of non-zeros in row
96fdfd2e7 Implement tests for Euler conversion with jets
db1ebd3ff Work around lack of constexpr constructors for Jet
16a4fa04e Further Jet conversion fixes
92ad18b8a Fix a Jet conversion bug in rotation.h
a5e745d4e ClangTidy fixes
77ad8bb4e Change storage in BlockRandomAccessSparseMatrix
d340f81bd Clang Tidy fixes
54ad3dd03 Reorganize ParallelFor source files
ba360ab07 Change the value of BlockRandomAccessSparseMatrix::kMaxRowBlocks
0315c6ca9 Provide DynamicAutoDiffCostFunction deduction guide
3cdfae110 Replace Hertzberg mentions with citations
0af38a9fc Fix typos in documentation
4c969a6c1 Improve image of loss functions shape
b54f05b8e Add missing TukeyLoss to documentation
8bf4a2f42 Inexact check for ParallelAssign test
9cddce73a Explicit conversions from long to int in benchmarks (for num_threads)
5e787ab70 Using const ints in jacobian writers
f9bffbb6f Removing -Wshorten-64-to-32 warnings from examples (part 1)
e269b64f5 More ClangTidy fixes
f4eb768e0 Using int64 in file.cc. Fixing compilation error in array_utils.cc
74a0f0d24 Using int64_t for sizes and indexes in array utils
749a442d9 Clang-Tidy fixes
c4ba975ae Fix rotation_test.cc to work with older versions of Eigen
9602ed7b7 ClangFormat changes
79a554ffc Fix a bug in QuaternionRotatePoint.
f1113c08a Commenting unused parameters for better readibility
772d927e1 Replacing old style typedefs with new style usings
53df5ddcf Removing using std::...
e1ca3302a Increasing bazel timeout for heavy BA tests
f982d3071 Fixing bazel build
cb6b30662 Use hypot to compute the L^2 norm
1e2a24a8b Update Github actions to avoid deprecation warnings
285e5f9f4 Do not update brew formulae upon install
73d95b03f Clang-Tidy fixes
51d52c3ea Correct epsilon in CUDA QR test changed by last commit.
546f5337b Updates to CUDA dense linear algebra tests
19ab2c179 BlockRandomAccessMatrix Refactor
a3a062d72 Add a missing header
2b88bedb2 Remove unused variables
9beea728f Fix a bug in CoordinateDescentMinimizer
8e5d83f07 ClangFormat and ClangTidy changes
b15851508 Parallel operations on vectors
2fd81de12 Add build configuration with CUDA on Linux
06bfe6ffa Remove OpenMP and No threading backends.
352b320ab Fixed SuiteSparse 6.0 version parsing
8fd7828e3 ClangTidy fixes
0424615dc Fix PartitionedMatrixView usage in evaluation_benchmark
77a54dd3d Parallel updates to block-diagonal EtE FtF
946fa50de ClangTidy fixes
e4bef9505 Refactor PartitionedMatrixView to cache the partitions
d3201798e Clean up sparse_cholesky_test
addcd342f ClangTidy fixes
c2e7002d2 Remove an unused variable from evaluation_benchmark.cc
fef6d5875 Parallel left products for PartitionedMatrixView
37a3cb384 Update SuiteSparse in MSVC Github workflow
5d53d1ee3 Parallel for with iteration costs and left product
9aa52c6ff Use FindCUDAToolkit for CMake >= 3.17
47e03a6d8 Add const accessor for Problem::Options used by Problem
d6a931009 Clang Tidy Fixes
9364e31ee Fix a regression in SuiteSparse::AnalyzeCholesky
89b3e1f88 Remove unused includes of gflags and gtest
984079003 Fix missing regex dependency for gtest on QNX
6b296f27f Fix missing namespace qualification and docs for Manifold gtest macro
6685e629f AddBlockStructureTranspose to BlockSparseMatrix
699e3f3b3 Fix a link error in evaluation_benchmark.cc
19a3d07f9 Add multiplication benchmarks on BAL data
b221b1294 Format code with clang-format.
ccf32d70c Purge all remaining references to (defunct) LocalParameterization
5f8c406e2 struct ContextImpl -> class ContextImpl
9893c534c Several cleanups.
a78a57472 ClangTidy fixes
b1fe60330 Parallel right products for partitioned view
16668eedf Fix a memory leak in ContextImpl
d129938d5 Third time is the charm
4129b214a More fixes to cuda_dense_cholesky_test.cc
5c01d2573 Remove unused variables from cuda_dense_cholesky_test.cc
5e877ae69 Fix the Bazel build
d89290ffa Fix evalution_benchmark compilability
ae7f456e3 ClangTidy fixes
9438c370f Restore the semantics of TrustRegionMinimizer
c964fce90 Fix bug in cuda_kernels_test
afaad5678 Fix a typo
b7116824b Evaluation benchmark
2b89ce66f Add generalized Euler Angle conversions
8230edc6c ClangTidy fixes
9a2894763 Speed up locking when num_threads = 1.
739f2a25a Parallelize block_jacobi_preconditioner
c0c4f9394 Change implementation of parallel for
fc826c578 CUDA Cleanup
660af905f Fix a bug in TrustRegionMinimizer.
4cd257cf4 Let NumericDiffFirstOrderFunction take a dynamically sized parameter vector
6c27ac6d5 Fix repeated SpMV Benchmark
430a292ac Add a missing include
858b4b89b More ClangTidy fixes
e9e995740 ClangTidy fixes
7f8e930a0 Fix lint errors
f86a3bdbe Unify Block handling across matrix types
5f1946879 clang-formated source
00a05cf70 CUDA SDK Version-based SpMV Selection
de0f74e40 Optimize  the BlockCRSJacobiPreconditioner
ba65ddd31 Improvements to benchmarks
42352e2e2 Added CUDA Jacobi Preconditioner
f802a09ff &foo[0] -> foo.data()
344929647 Add CUDA GPU and Runtime Detection
6085e45be Minor CUDA cleanup.
e15ec89f3 Speed up bundle_adjuster
058a72782 One more ClangTidy fix.
f6f2f0d16 ClangTidy cleanups Also some clang-format cleanups.
9d74c6913 Fix some more errant CATD warnings
a4f744095 Remove an unused variable from compressed_row_sparse_matrix.cc
388c14286 Fix GCC 12.1.1 LTO -Walloc-size-larger-than= warnings
9ec4f7e44 Refactor BlockJacobiPreconditioner
adda97acd Fixed a few more missing CERES_NO_CUDA guards
22aeb3584 Fix a missing string assignment in solver.cc
3b891f767 Insert missing CUDA guards.
829089053 CUDA CGNR, Part 4: CudaCgnrSolver
6ab435d77 Fix a missing CERES_NO_CUDA guard
c560bc2be CUDA CGNR, Part 3: CudaSparseMatrix
c914c7a2b CUDA CGNR, Part 2: CudaVector
3af3dee18 Simplify the implementation to convert from BlockSparseMatrix to CompressedRowSparseMatrix.
242fc0795 Remove unnecessary destructors
737200ac8 Add macos-12 to Github workflow runners
9c968d406 Silence Clang warning
2c78c5f33 Small naming fixups.
2a25d86b0 Integrate schur power series expansion options to bundle adjuster
4642e4b0c Use if constexpr and map SparseMatrix as const
92d837953 Enable usage of schur power series expansion preconditioner.
f1dfac8cd Reduce the number of individual PRNG instances
79e403b15 Expand vcpkg installation instructions
7b0bb0e3f ClangTidy cleanups
c5c2afcc9 Fix solver_test.cc for preconditioners and sparse linear algebra libraries
07d333fb6 Refactor options checking for linear solvers
ba7207b0b A number of small changes.
20e85bbe3 Add power series expansion preconditioner
04899645c LinearOperator::FooMultiply -> LinearOperator::FooMultiplyAndAccumulate
288a3fde6 Add missing virtual destructors to matrix adapters
6483a2b4c Add Sergiu's name to the list of maintainers
1cf49d688 Update FindGlog.cmake to create glog::glog target
1da72ac39 Refactor ConjugateGradientsSolver
f62dccdb3 Fix the Sphere and Line Manifold formulations
3e1cc89f6 A bunch of clang-tidy fixes.
80380538a One more CATD fix
560ef46fb A bunch of minor fixes.
67bae28c1 CUDA CGNR, Part 1: Misc. CLeanup
5d0bca14d Remove ceres/internal/random.h in favor of <random>
d881b5ccf Minor fixes in comments
37516c968 Fix a bug in InnerProductComputer.
d8dad14ee CUDA Cleanup
c9d2ec8a9 Updates to sparse block matrix structures to support new sparse linear solvers.
5fe0bd45a Added MinGW to Windows Github workflow
738c027c1 Fix a logic error in iterative_refiner_test
cb6ad463d Add mixed precision support for CPU based DenseCholesky
df55682ba Fix Eigen error in 2D sphere manifolds Since Eigen does not allow to have a RowMajor column vector (see https://gitlab.com/libeigen/eigen/-/issues/416), the storage order must be set to ColMajor in that case. This fix adds that special case when generating 2D sphere manifolds.
1cf59f61e Set Github workflow NDK path explicitly
68c53bb39 Remove ceres::LocalParameterization
2f660464c Fix build issue with CUDA testing targets when compiling without gflags.
c801192d4 Minor fixes
ce9e902b8 Fix missing CERES_METIS_VERSION
d9a3dfbf2 Add a missing ifdef guard to dense_cholesky_test
5bd43a1fa Speed up DenseSparseMatrix::SquareColumnNorm.
cbc86f651 Fix the build when CUDA is not present
5af8e6449 Update year in solver.h
88e08cfe7 Mixed-precision Iterative Refinement Cholesky With CUDA
290b34ef0 Fix optional SuiteSparse + METIS test-suite names to be unique
d038e2d83 Fix use of NESDIS with SuiteSparse in tests if METIS is not found
027e741a1 Eliminated MinGW warning
4e5ea292b Fixed MSVC 2022 warning
83f6e0853 Fix use of conditional preprocessor checks within a macro in tests
70f1aac31 Fix fmin/fmax() when using Jets with float as their scalar type
5de77f399 Fix reporting of METIS version
11e637667 Fix #ifdef guards around METIS usage in EigenSparse backend and tests
0c88301e6 Provide optional METIS support
f11c25626 Fix fmin/fmax() to use Jet averaging on equality
b90053f1a Revert C++17 usage of std::exclusive_scan
dfce1e128 Link against threading library only if necessary
69eddfb6d Use find module to link against OpenMP
b4803778c Update documentation for linear_solver_ordering_type
2e764df06 Update Cuda memcheck test
443ae9ce2 Update Cuda memcheck test
55b4c3f44 Retain terminal formatting when building docs
786866d9f Generate version string at compile time
5bd83c4ac Unbreak the build with EIGENSPARSE is disabled
2335b5b4b Remove support for CXSparse
fbc2eea16 Nested dissection for ACCELERATE_SPARSE & EIGEN_SPARSE
d87fd551b Fix Ubuntu 20.04 workflow tests
71717f37c Use glog 0.6 release to run Windows Github workflow
66e0adfa7 Fix detection of sphinx-rtd-theme
d09f7e9d5 Enable postordering when computing the sparse factorization.
9b34ecef1 Unbreak the build on MacOS
8ba8fbb17 Remove Solver::Options::use_postordering
30b4d5df3 Fix the ceres.bzl to add missing cc files.
39ec5e8f9 Add Nested Dissection based fill reducing ordering
aa62dd86a Fix a build breakage
41c5fb1e8 Refactor suitesparse.h/cc
12263e283 Make the min. required version of SuiteSparse to be 4.5.6
c8493fc36 Convert internal enums to be class enums.
bb3a40c09 Add Nested Dissection ordering method to SuiteSparse
f1414cb5b Correct spelling in comments and docs.
fd2b0ceed Correct spelling (contiguous, BANS)
464abc198 Run Linux Github workflow on Ubuntu 22.04
caf614a6c Modernize code using c++17 constructs
be618133e Simplify some template metaprograms using fold expressions.
3b0096c1b Add the ability to specify the pivot threshold in Covariance::Options
40c1a7e18 Fix Github workflows
127474360 Ceres Solver now requires C++17
b5f1b7877 clang-format cleanup
32cd1115c Make the code in small_blas_generic.h more compiler friendly.
f68321e7d Update version history
b34280207 Fix MSVC small_blas_test failures
b246991b6 Update the citation instructions in the docs
c0c14abca Fix version history item numbering
d23dbac25 Update Windows install guide
e669c9fc7 Provide citation file
ff57c2e91 Update version history for 2.1.0rc2
ab9436cb9 Workaround MSVC STL deficiency in C++17 mode
97c232857 Update the included gtest to version 1.11.0
4eac7ddd2 Fix Jet lerp test regression
2ffbe126d Fix Jet test failures on ARMv8 with recent Xcode
0d6a0292c Fix unused arguments of Make1stOrderPerturbation
93511bfdc Fix SuiteSparse path and version reporting
bf329b30f Fix link to macOS badge
0133dada2 Add Github workflows
3d3d6ed71 Add missing includes
0a9c0df8a Fix path for cuda-memcheck tests
ee35ef66f ClangFormat cleanup via scripts/all_format.sh
470515985 Add missing includes for config.h
d3612c12c Set CMP0057 policy for IN_LIST operator in FindSuiteSparse.cmake
4bc100c13 Do not define unusable import targets
e91995cce Fix Ubuntu 18.04 shared library build
94af09186 Force C++ linker
a65e73885 Update installation docs
1a377d707 Fix Ubuntu 20.04 SPQR build
817f5a068 Switch to imported SuiteSparse, CXSparse, and METIS targets
b0f32a20d Hide remaining internal symbols
572395098 Add a missing include
b0aef211d Allow to store pointers in ProductManifold
9afe8cc45 Small compile fix to context_impl
284be88ca Allow ProductManifold default construction
f59059fff Bugfix to CUDA workspace handling
779634164 Fix MSVC linker error
7743d2e73 Store ProductManifold instances in a tuple
9f32c42ba Update Travis-CI status badge to .com from .org
eadfead69 Move LineManifold and SphereManifold into their own headers.
f0f8f93bb Fix docs inconsistencies
e40391efa Update version history in preparation for 2.1.0
6a37fbf9b Add static/compile time sizing to EuclideanManifold
4ad787ce1 Fix the bazel build
ae4d95df6 Two small clang-tidy fixes
f0851667b Fix MSVC compilation errors
c8658c899 Modernize more
46b3495a4 Standardize path handling using GNUInstallDirs
98bc3ca17 Fix shared library build due to missing compile features specification
8fe8ebc3a Add final specifier to public classes
84e1696f4 Add final specifier to internal classes.
518970f81 Context should be exported
09ec4997f Cleanup examples
90e58e10f Add missing #include.
15348abe9 Add CUDA based bundle adjustment tests.
57ec9dc92 Do not enforce a specific C++ standard
99698f053 Fix Apple Clang weak symbols warnings
8e0842162 Add support for dense CUDA solvers #3
aff51c907 Revert "Do not enforce a specific C++ standard"
527c3f7da Fixed gflags dependency
d839b7792 Do not enforce a specific C++ standard
f71167c62 Fixed missing include in minilog build
47502b833 Miscellaneous CUDA related changes.
bb2996681 Check CUDA is available in solver.cc
7d2e4152e Add support for dense CUDA solvers #2
f90833f5f Simplify symbol export
c6158e0ab Replace NULL by nullptr
7e4f5a51b Remove blas.h/cc as they are not used anymore.
e0fef6ef0 Add cmake option ENABLE_BITCODE for iOS builds
446487c54 Add <memory> header to all files using std::unique_ptr.
9c5f29d46 Use compiler attributes instead of [[deprecated]]
44039af2c Convert factory functions to return std::unique_ptrs.
708a2a723 Silence LocalParameterization deprecation warnings
de69e657a Fix another missing declaration warning
677711138 Fix segmentation fault in AVX2 builds
0141ca090 Deprecate LocalParameterizations
fdfa5184a Fix missing declaration warning
4742bf386 Add LineManifold.
c14f360e6 Drop trivial special members
ae65219e0 ClangTidy cleanups
a35bd1bf9 Use = default for trivial special members
db67e621e Fix dense_cholesky_test.cc comma handling.
484d3414e Replace virtual keyword by override
2092a720e Fix some nits.
36d6d8690 Add support for dense CUDA solvers #1
af5e48c71 Add SphereManifold.
182cb01c5 Normalize Jet classification and comparison
9dbd28989 Loosen tolerances in dense_qr_test.cc
cab853fd5 Add DenseQR Interface
8ae054ad9 Fix missing declaration warning in autodiff_manifold_test
408af7b1a Move the constructor and destructor for SchurComplementSolver
d51672d1c Move the constructor and destructor for DenseSchurComplementSolver
177b2f99d Use benchmark version 1.6 compatible syntax.
ce9669003 Add const accessor for functor wrapped by auto/numeric-diff objects
9367ec9cc Add a benchmark for dense linear solvers.
7d6524daf Support fma Jet
a0d81ad63 Fix a bug in AutoDiffManifold
5a99e42e1 ClangTidy fixes
40fb41355 Update .gitignore
e6e6ae087 Unbreak the bazel build
0572efc57 Fix a compilation warning in autodiff_manifold.h
475db73d0 Fix build breakage when LAPACK support is disabled.
095c9197f Fix iterative_refiner_test.cc
6d06e9b98 Add DenseCholesky
77c0c4d09 Migrate examples to use Manifolds
19eef54fc Rename Quaternion to QuaternionManifold. Also rename EigenQuaternion to EigenQuaternionManiold.
ca6d841c2 Add AutoDiffManifold
97d7e0737 Move the manifold testing matchers to manifold_test_utils.h
16436b34b Fix some more clang-tidy suggestions.
dcdefc216 Fix a bunch of clang-tidy suggestions.
d8a1b69ab Remove an unused variable from gradient_checker.cc
611b46b54 Remove the use of CHECK_NOTNULL.
125a0e9be LocalParameterization -> Manifold #1
00bfbae11 Add missing algorithm header to manifold.cc
1d5aff059 Refactor jet_tests.cc
fbd693091 Fix two unused variable warnings.
c0cb42e5f Add Problem::HasParameterization
7e2f9d9d4 Add EigenQuaternion manifold and tests for it.
b81a8bbb7 Add the Quaternion manifold and tests.
4a01dcb88 Add more invariants and documentation to manifold_test.cc
bdd80fcce Improve Manifold testing
ce1537030 Fixed missing headers in manifold.h
23b204d7e LocalParameterization -> Manifold #1
c2fab6502 Fix docs of supported sparse backends for mixed_precision_solves option
8cb441c49 Fix missing declaration warnings in GCC
d2b7f337c Remove split.cc from the bazel sources.
206061a6b Use standard c++ types in jet_test.cc
1f374a9fe Support promotion in comparison between Jet and scalars
06e68dbc5 Avoid midpoint overflow in the differential
276d24c73 Fix C++20 compilation
b1391e062 Support midpoint Jet
8426526df Support lerp Jet
57c279689 support 3-argument hypot jet
123fba61c Eigen::MappedSparseMatrix -> Eigen::Map<Eigen::SparseMatrix>
3f950c66d reworked copysign tests
48cb54d1b fix fmin and fmax NaN handling
552a4e517 support log10 jet
4e49c5422 reuse expm1 result for differential
8d3e64dd5 Use modern-style Eigen3 CMake variables
2fba61434 support log1p and expm1 jet
a668cabbc support norm jet
a3a4b6d77 support copysign jet
b75dac169 fix abs jet test comment
034bf566f use copysign for abs jet
31008453f Add example for BiCubicInterpolator
7ef4a1221 Add a section on implicit and inverse function theorems
686428f5c Move the further reading section to bibliography
e47d87fdd Add a note about Trigg's correction
d2852518d Fix the docs for Problem::RemoveResidualBlock & Problem::RemoveParameterBlock
06e02a173 Delete unused files split.h/cc
ac7268da5 Fix an 80cols issue in covariance_impl.cc
17dccef91 Add NumericDiffFirstOrderFunction
03d64141a Fix a incorrect check in reorder_program.cc
884111913 Two changes to TinySolver
4dff3ea2c Fix a number of typos in rotation.h
98719ced4 Fix a type in interfacing_with_autodiff.html
0299ce944 Update conf.py to be compatible with Sphinx 4.1.2
2a2b9bd6f Fix a bug in covariance_impl.cc
27fade7b8 Fix a bug in system_test.cc
d4eb83ee5 Fix the Jacobian in trust_region_minimizer_test.cc
5f6071a1c Fix a bug in local_parameterization_test.cc
b2e732b1e Fix errors in comments from William Gandler.
42f1d6717 Add accessors to GradientProblem
aefd37b18 Refactor small_blas_gemm_benchmark
dc20db303 [docs] Fix `IterationSummary` section. Add missing `IterationCallback`
90ba7d1ef [docs] Fix typos
c3129c3d4 Fix tests not executing
7de561e8e Fix dependency check for building documentation
4fbe218f2 Refactor small_blas_test
3fdde8ede Remove an errant double link.
20ad431f7 Fixing a typo in the version history
0c85c4092 Revert "Reduce copies involved in Jet operations"
3a02d5aaf Fix typo in LossFunctionWrapper sample code
7b2c223be Add fmax/fmin overloads for scalars
c036c7819 Reduce copies involved in Jet operations
51945e061 Introduce benchmark for Jet operations
ec4f2995b Do not check MaxNumThreadsAvailable if the thread number is set to 1.
98f639f54 Add a macro CERES_GET_FLAG.
766f2cab5 Reduce log spam in covariance_impl.cc.
941ea1347 Fix FindTBB version detection with TBB >= 2021.1.1
323c350a6 fix Eigen3_VERSION
2b32b3212 Revert "Group specializations into groups of four"
313caf1ae Allow Unity build.
4ba244cdb Group specializations into groups of four
d77a8100a Make miniglog's InitGoogleLogging argument const.
863724994 Use portable expression for constant 2/sqrt(pi)
97873ea65 Add some missing includes for glog/logging.h
d15b1bcd3 Increase tolerance in small_blas_test.cc
17cf01831 Hide 'format not a string literal' error in examples
64029909b Fix -Wno-maybe-uninitialized error
21294123d Fix nonnull arg compared to NULL error.
1dd417410 Fix -Wno-format-nonliteral
6c106bf51 Fix -Wmissing-field-initializers error
c48a32792 Use cc_binary includes so examples build as external repo
e0e14a5cd Fix errors found by -Werror
e84cf10e1 Fix an errant double in TinySolver.
66b4c33e8 updated unit quaternion rotation
d45ec47b5 Fix a typo in schur_eliminator.h

Change-Id: I60db062e44d051d50dbb3a145eec2f74d5190481
git-subtree-dir: third_party/ceres
git-subtree-split: 7f9cc571b03632f1df93ea35725a1f5dfffe2c72
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
diff --git a/docs/CMakeLists.txt b/docs/CMakeLists.txt
index cfdd910..b5588ef 100644
--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@@ -1,5 +1,5 @@
 # Ceres Solver - A fast non-linear least squares minimizer
-# Copyright 2015 Google Inc. All rights reserved.
+# Copyright 2023 Google Inc. All rights reserved.
 # http://ceres-solver.org/
 #
 # Redistribution and use in source and binary forms, with or without
diff --git a/docs/source/CMakeLists.txt b/docs/source/CMakeLists.txt
index 70bf998..1f8fed8 100644
--- a/docs/source/CMakeLists.txt
+++ b/docs/source/CMakeLists.txt
@@ -1,19 +1,21 @@
-find_package(Sphinx REQUIRED)
-
 # HTML output directory
 set(SPHINX_HTML_DIR "${Ceres_BINARY_DIR}/docs/html")
 
 # Install documentation
 install(DIRECTORY ${SPHINX_HTML_DIR}
-        DESTINATION "${CERES_DOCS_INSTALL_DIR}"
+        DESTINATION ${CMAKE_INSTALL_DOCDIR}
         COMPONENT Doc
         PATTERN "${SPHINX_HTML_DIR}/*")
 
+# Find python
+find_package(Python REQUIRED COMPONENTS Interpreter)
+
 # Building using 'make_docs.py' python script
 add_custom_target(ceres_docs ALL
-                  python
+                  $<TARGET_FILE:Python::Interpreter>
                   "${Ceres_SOURCE_DIR}/scripts/make_docs.py"
                   "${Ceres_SOURCE_DIR}"
                   "${Ceres_BINARY_DIR}/docs"
-                  "${SPHINX_EXECUTABLE}"
+                  "${Sphinx_BUILD_EXECUTABLE}"
+                  USES_TERMINAL
                   COMMENT "Building HTML documentation with Sphinx")
diff --git a/docs/source/automatic_derivatives.rst b/docs/source/automatic_derivatives.rst
index e15e911..3fe0727 100644
--- a/docs/source/automatic_derivatives.rst
+++ b/docs/source/automatic_derivatives.rst
@@ -37,9 +37,7 @@
   };
 
 
-  CostFunction* cost_function =
-        new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
-          new Rat43CostFunctor(x, y));
+  auto* cost_function = new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(x, y);
 
 Notice that compared to numeric differentiation, the only difference
 when defining the functor for use with automatic differentiation is
@@ -298,7 +296,7 @@
 
 There is no single solution to this problem. In some cases one needs
 to reason explicitly about the points where indeterminacy may occur
-and use alternate expressions using `L'Hopital's rule
+and use alternate expressions using `L'Hôpital's rule
 <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
 example some of the conversion routines in `rotation.h
 <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
diff --git a/docs/source/bibliography.rst b/docs/source/bibliography.rst
index c13c676..ba3bc87 100644
--- a/docs/source/bibliography.rst
+++ b/docs/source/bibliography.rst
@@ -4,6 +4,20 @@
 Bibliography
 ============
 
+Background Reading
+==================
+
+For a short but informative introduction to the subject we recommend
+the booklet by [Madsen]_ . For a general introduction to non-linear
+optimization we recommend [NocedalWright]_. [Bjorck]_ remains the
+seminal reference on least squares problems. [TrefethenBau]_ is our
+favorite text on introductory numerical linear algebra. [Triggs]_
+provides a thorough coverage of the bundle adjustment problem.
+
+
+References
+==========
+
 .. [Agarwal] S. Agarwal, N. Snavely, S. M. Seitz and R. Szeliski,
    **Bundle Adjustment in the Large**, *Proceedings of the European
    Conference on Computer Vision*, pp. 29--42, 2010.
@@ -31,6 +45,9 @@
 .. [Conn] A.R. Conn, N.I.M. Gould, and P.L. Toint, **Trust region
    methods**, *Society for Industrial Mathematics*, 2000.
 
+.. [Davis] Timothy A. Davis, **Direct methods for Sparse Linear
+   Systems**, *SIAM*, 2006.
+
 .. [Dellaert] F. Dellaert, J. Carlson, V. Ila, K. Ni and C. E. Thorpe,
    **Subgraph-preconditioned conjugate gradients for large scale SLAM**,
    *International Conference on Intelligent Robots and Systems*, 2010.
@@ -44,7 +61,7 @@
    Preconditioners for Sparse Linear Least-Squares Problems**,
    *ACM Trans. Math. Softw.*, 43(4), 2017.
 
-.. [HartleyZisserman] R.I. Hartley & A. Zisserman, **Multiview
+.. [HartleyZisserman] R.I. Hartley and A. Zisserman, **Multiview
    Geometry in Computer Vision**, Cambridge University Press, 2004.
 
 .. [Hertzberg] C. Hertzberg, R. Wagner, U. Frese and L. Schroder,
@@ -79,6 +96,11 @@
    preconditioner for large sparse least squares problems**, *SIAM
    Journal on Matrix Analysis and Applications*, 28(2):524-550, 2007.
 
+.. [LourakisArgyros] M. L. A. Lourakis, A. A. Argyros, **Is
+   Levenberg-Marquardt the most efficient algorithm for implementing
+   bundle adjustment?**, *International Conference on Computer
+   Vision*, 2005.
+
 .. [Madsen] K. Madsen, H.B. Nielsen, and O. Tingleff, **Methods for
    nonlinear least squares problems**, 2004.
 
@@ -100,7 +122,7 @@
 .. [Nocedal] J. Nocedal, **Updating Quasi-Newton Matrices with Limited
    Storage**, *Mathematics of Computation*, 35(151): 773--782, 1980.
 
-.. [NocedalWright] J. Nocedal & S. Wright, **Numerical Optimization**,
+.. [NocedalWright] J. Nocedal and S. Wright, **Numerical Optimization**,
    Springer, 2004.
 
 .. [Oren] S. S. Oren, **Self-scaling Variable Metric (SSVM) Algorithms
@@ -108,7 +130,7 @@
    20(5), 863-874, 1974.
 
 .. [Press] W. H. Press, S. A. Teukolsky, W. T. Vetterling
-   & B. P. Flannery, **Numerical Recipes**, Cambridge University
+   and B. P. Flannery, **Numerical Recipes**, Cambridge University
    Press, 2007.
 
 .. [Ridders] C. J. F. Ridders, **Accurate computation of F'(x) and
@@ -122,27 +144,37 @@
    systems**, SIAM, 2003.
 
 .. [Simon] I. Simon, N. Snavely and S. M. Seitz, **Scene Summarization
-   for Online Image Collections**, *International Conference on Computer Vision*, 2007.
+   for Online Image Collections**, *International Conference on
+   Computer Vision*, 2007.
 
 .. [Stigler] S. M. Stigler, **Gauss and the invention of least
    squares**, *The Annals of Statistics*, 9(3):465-474, 1981.
 
-.. [TenenbaumDirector] J. Tenenbaum & B. Director, **How Gauss
+.. [TenenbaumDirector] J. Tenenbaum and B. Director, **How Gauss
    Determined the Orbit of Ceres**.
 
 .. [TrefethenBau] L.N. Trefethen and D. Bau, **Numerical Linear
    Algebra**, SIAM, 1997.
 
-.. [Triggs] B. Triggs, P. F. Mclauchlan, R. I. Hartley &
+.. [Triggs] B. Triggs, P. F. Mclauchlan, R. I. Hartley and
    A. W. Fitzgibbon, **Bundle Adjustment: A Modern Synthesis**,
    Proceedings of the International Workshop on Vision Algorithms:
    Theory and Practice, pp. 298-372, 1999.
 
+.. [Weber] S. Weber, N. Demmel, TC Chan, D. Cremers, **Power Bundle
+   Adjustment for Large-Scale 3D Reconstruction**, *IEEE Conference on
+   Computer Vision and Pattern Recognition*, 2023.
+
 .. [Wiberg] T. Wiberg, **Computation of principal components when data
    are missing**, In Proc. *Second Symp. Computational Statistics*,
    pages 229-236, 1976.
 
-.. [WrightHolt] S. J. Wright and J. N. Holt, **An Inexact
-   Levenberg Marquardt Method for Large Sparse Nonlinear Least
-   Squares**, *Journal of the Australian Mathematical Society Series
-   B*, 26(4):387-403, 1985.
+.. [WrightHolt] S. J. Wright and J. N. Holt, **An Inexact Levenberg
+   Marquardt Method for Large Sparse Nonlinear Least Squares**,
+   *Journal of the Australian Mathematical Society Series B*,
+   26(4):387-403, 1985.
+
+.. [Zheng] Q. Zheng, Y. Xi and Y. Saad, **A power Schur Complement
+   low-rank correction preconditioner for general sparse linear
+   systems**, *SIAM Journal on Matrix Analysis and
+   Applications*, 2021.
diff --git a/docs/source/conf.py b/docs/source/conf.py
index c83468f..faa2403 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -25,7 +25,7 @@
 
 # Add any Sphinx extension module names here, as strings. They can be extensions
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.ifconfig']
+extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.ifconfig', 'sphinxcontrib.jquery']
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -41,16 +41,16 @@
 
 # General information about the project.
 project = u'Ceres Solver'
-copyright = u'2020 Google Inc'
+copyright = u'2023 Google Inc'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
 # built documents.
 #
 # The short X.Y version.
-version = '2.0'
+version = '2.2'
 # The full version, including alpha/beta/rc tags.
-release = '2.0.0'
+release = '2.2.0'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -246,7 +246,7 @@
 # By default MathJax does not use TeX fonts, which is a tragedy. Also
 # scaling the fonts down a bit makes them fit better with font sizing
 # in the "Read The Docs" theme.
-mathjax_config = {
+mathjax3_config = {
 'HTML-CSS' : {
         'availableFonts' : ["TeX"],
         'scale' : 90
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
index a128e30..cba274c 100644
--- a/docs/source/contributing.rst
+++ b/docs/source/contributing.rst
@@ -118,8 +118,8 @@
 
    When the push succeeds, the console will display a URL showing the
    address of the review. Go to the URL and add at least one of the
-   maintainers (Sameer Agarwal, Keir Mierle, Alex Stewart or William
-   Rucklidge) as reviewers.
+   maintainers (Sameer Agarwal, Keir Mierle, Alex Stewart, William
+   Rucklidge or Sergiu Deitsch) as reviewers.
 
 3. Wait for a review.
 
diff --git a/docs/source/derivatives.rst b/docs/source/derivatives.rst
index bff6a29..d9a52b0 100644
--- a/docs/source/derivatives.rst
+++ b/docs/source/derivatives.rst
@@ -58,3 +58,4 @@
    numerical_derivatives
    automatic_derivatives
    interfacing_with_autodiff
+   inverse_and_implicit_function_theorems
diff --git a/docs/source/faqs.rst b/docs/source/faqs.rst
index 5a28f41..65c64e6 100644
--- a/docs/source/faqs.rst
+++ b/docs/source/faqs.rst
@@ -16,14 +16,3 @@
 
    modeling_faqs
    solving_faqs
-
-
-Further Reading
-===============
-
-For a short but informative introduction to the subject we recommend
-the booklet by [Madsen]_ . For a general introduction to non-linear
-optimization we recommend [NocedalWright]_. [Bjorck]_ remains the
-seminal reference on least squares problems. [TrefethenBau]_ book is
-our favorite text on introductory numerical linear algebra. [Triggs]_
-provides a thorough coverage of the bundle adjustment problem.
diff --git a/docs/source/features.rst b/docs/source/features.rst
index 724d6dc..609f41c 100644
--- a/docs/source/features.rst
+++ b/docs/source/features.rst
@@ -1,11 +1,15 @@
+.. default-domain:: cpp
+
+.. cpp:namespace:: ceres
+
 ====
 Why?
 ====
 .. _chapter-features:
 
 * **Code Quality** - Ceres Solver has been used in production at
-  Google for more than four years now. It is clean, extensively tested
-  and well documented code that is actively developed and supported.
+  Google since 2011. It is clean, extensively tested and well
+  documented code that is actively developed and supported.
 
 * **Modeling API** - It is rarely the case that one starts with the
   exact and complete formulation of the problem that one is trying to
@@ -27,10 +31,10 @@
     allows the user to *shape* their residuals using a
     :class:`LossFunction` to reduce the influence of outliers.
 
-  - **Local Parameterization** In many cases, some parameters lie on a
-    manifold other than Euclidean space, e.g., rotation matrices. In
-    such cases, the user can specify the geometry of the local tangent
-    space by specifying a :class:`LocalParameterization` object.
+  - **Manifolds** In many cases, some parameters lie on a manifold
+    other than Euclidean space, e.g., rotation matrices. In such
+    cases, the user can specify the geometry of the local tangent
+    space by specifying a :class:`Manifold` object.
 
 * **Solver Choice** Depending on the size, sparsity structure, time &
   memory budgets, and solution quality requirements, different
@@ -42,10 +46,11 @@
     computational cost in all of these methods is the solution of a
     linear system. To this end Ceres ships with a variety of linear
     solvers - dense QR and dense Cholesky factorization (using
-    `Eigen`_ or `LAPACK`_) for dense problems, sparse Cholesky
-    factorization (`SuiteSparse`_, `CXSparse`_ or `Eigen`_) for large
-    sparse problems, custom Schur complement based dense, sparse, and
-    iterative linear solvers for `bundle adjustment`_ problems.
+    `Eigen`_, `LAPACK`_ or `CUDA`_) for dense problems, sparse
+    Cholesky factorization (`SuiteSparse`_, `Apple's Accelerate`_,
+    `Eigen`_) for large sparse problems, custom Schur complement based
+    dense, sparse, and iterative linear solvers for `bundle
+    adjustment`_ problems.
 
   - **Line Search Solvers** - When the problem size is so large that
     storing and factoring the Jacobian is not feasible or a low
@@ -54,18 +59,21 @@
     of Non-linear Conjugate Gradients, BFGS and LBFGS.
 
 * **Speed** - Ceres Solver has been extensively optimized, with C++
-  templating, hand written linear algebra routines and OpenMP or
-  modern C++ threads based multithreading of the Jacobian evaluation
-  and the linear solvers.
+  templating, hand written linear algebra routines and modern C++
+  threads based multithreading of the Jacobian evaluation and the
+  linear solvers.
 
-* **Solution Quality** Ceres is the `best performing`_ solver on the NIST
-  problem set used by Mondragon and Borchers for benchmarking
+* **GPU Acceleration** If your system supports `CUDA`_ then Ceres
+  Solver can use the Nvidia GPU on your system to speed up the solver.
+
+* **Solution Quality** Ceres is the `best performing`_ solver on the
+  NIST problem set used by Mondragon and Borchers for benchmarking
   non-linear least squares solvers.
 
 * **Covariance estimation** - Evaluate the sensitivity/uncertainty of
   the solution by evaluating all or part of the covariance
-  matrix. Ceres is one of the few solvers that allows you to do
-  this analysis at scale.
+  matrix. Ceres is one of the few solvers that allows you to do this
+  analysis at scale.
 
 * **Community** Since its release as an open source software, Ceres
   has developed an active developer community that contributes new
@@ -82,6 +90,7 @@
 .. _SuiteSparse: http://www.cise.ufl.edu/research/sparse/SuiteSparse/
 .. _Eigen: http://eigen.tuxfamily.org/
 .. _LAPACK: http://www.netlib.org/lapack/
-.. _CXSparse: https://www.cise.ufl.edu/research/sparse/CXSparse/
 .. _automatic: http://en.wikipedia.org/wiki/Automatic_differentiation
 .. _numeric: http://en.wikipedia.org/wiki/Numerical_differentiation
+.. _CUDA : https://developer.nvidia.com/cuda-toolkit
+.. _Apple's Accelerate: https://developer.apple.com/documentation/accelerate/sparse_solvers
diff --git a/docs/source/gradient_solver.rst b/docs/source/gradient_solver.rst
index dde9d7e..4e3fc71 100644
--- a/docs/source/gradient_solver.rst
+++ b/docs/source/gradient_solver.rst
@@ -2,6 +2,8 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-gradient_problem_solver:
 
 ==================================
@@ -54,9 +56,9 @@
    public:
     explicit GradientProblem(FirstOrderFunction* function);
     GradientProblem(FirstOrderFunction* function,
-                    LocalParameterization* parameterization);
+                    Manifold* manifold);
     int NumParameters() const;
-    int NumLocalParameters() const;
+    int NumTangentParameters() const;
     bool Evaluate(const double* parameters, double* cost, double* gradient) const;
     bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
   };
@@ -69,20 +71,18 @@
 form of the objective function.
 
 Structurally :class:`GradientProblem` is a composition of a
-:class:`FirstOrderFunction` and optionally a
-:class:`LocalParameterization`.
+:class:`FirstOrderFunction` and optionally a :class:`Manifold`.
 
 The :class:`FirstOrderFunction` is responsible for evaluating the cost
 and gradient of the objective function.
 
-The :class:`LocalParameterization` is responsible for going back and
-forth between the ambient space and the local tangent space. When a
-:class:`LocalParameterization` is not provided, then the tangent space
-is assumed to coincide with the ambient Euclidean space that the
-gradient vector lives in.
+The :class:`Manifold` is responsible for going back and forth between the
+ambient space and the local tangent space. When a :class:`Manifold` is not
+provided, then the tangent space is assumed to coincide with the ambient
+Euclidean space that the gradient vector lives in.
 
 The constructor takes ownership of the :class:`FirstOrderFunction` and
-:class:`LocalParamterization` objects passed to it.
+:class:`Manifold` objects passed to it.
 
 
 .. function:: void Solve(const GradientProblemSolver::Options& options, const GradientProblem& problem, double* parameters, GradientProblemSolver::Summary* summary)
@@ -103,7 +103,7 @@
    behavior of the solver. We list the various settings and their
    default values below.
 
-.. function:: bool GradientProblemSolver::Options::IsValid(string* error) const
+.. function:: bool GradientProblemSolver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -123,7 +123,7 @@
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
    ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   satisfied, the ``WOLFE`` line search should be used.
 
 .. member:: NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type
 
@@ -192,7 +192,7 @@
   low-sensitivity parameters. It can also reduce the robustness of the
   solution to errors in the Jacobians.
 
-.. member:: LineSearchIterpolationType GradientProblemSolver::Options::line_search_interpolation_type
+.. member:: LineSearchInterpolationType GradientProblemSolver::Options::line_search_interpolation_type
 
    Default: ``CUBIC``
 
@@ -342,8 +342,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the manifold associated with the parameter
+   vector.
 
 .. member:: double GradientProblemSolver::Options::parameter_tolerance
 
@@ -388,14 +388,14 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<IterationCallback> GradientProblemSolver::Options::callbacks
+.. member:: std::vector<IterationCallback> GradientProblemSolver::Options::callbacks
 
    Callbacks that are executed at the end of each iteration of the
    :class:`Minimizer`. They are executed in the order that they are
    specified in this vector. By default, parameter blocks are updated
    only at the end of the optimization, i.e., when the
    :class:`Minimizer` terminates. This behavior is controlled by
-   :member:`GradientProblemSolver::Options::update_state_every_variable`. If
+   :member:`GradientProblemSolver::Options::update_state_every_iteration`. If
    the user wishes to have access to the update parameter blocks when
    his/her callbacks are executed, then set
    :member:`GradientProblemSolver::Options::update_state_every_iteration`
@@ -404,7 +404,7 @@
    The solver does NOT take ownership of these pointers.
 
 
-.. member:: bool Solver::Options::update_state_every_iteration
+.. member:: bool GradientProblemSolver::Options::update_state_every_iteration
 
    Default: ``false``
 
@@ -420,12 +420,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string GradientProblemSolver::Summary::BriefReport() const
+.. function:: std::string GradientProblemSolver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string GradientProblemSolver::Summary::FullReport() const
+.. function:: std::string GradientProblemSolver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -444,7 +444,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string GradientProblemSolver::Summary::message
+.. member:: std::string GradientProblemSolver::Summary::message
 
    Reason why the solver terminated.
 
@@ -458,7 +458,7 @@
    Cost of the problem (value of the objective function) after the
    optimization.
 
-.. member:: vector<IterationSummary> GradientProblemSolver::Summary::iterations
+.. member:: std::vector<IterationSummary> GradientProblemSolver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
@@ -486,11 +486,11 @@
 
    Number of parameters in the problem.
 
-.. member:: int GradientProblemSolver::Summary::num_local_parameters
+.. member:: int GradientProblemSolver::Summary::num_tangent_parameters
 
    Dimension of the tangent space of the problem. This is different
    from :member:`GradientProblemSolver::Summary::num_parameters` if a
-   :class:`LocalParameterization` object is used.
+   :class:`Manifold` object is used.
 
 .. member:: LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type
 
diff --git a/docs/source/gradient_tutorial.rst b/docs/source/gradient_tutorial.rst
index 3fef6b6..2af44e1 100644
--- a/docs/source/gradient_tutorial.rst
+++ b/docs/source/gradient_tutorial.rst
@@ -2,52 +2,53 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-gradient_tutorial:
 
 ==================================
 General Unconstrained Minimization
 ==================================
 
-While much of Ceres Solver is devoted to solving non-linear least
-squares problems, internally it contains a solver that can solve
-general unconstrained optimization problems using just their objective
-function value and gradients. The ``GradientProblem`` and
-``GradientProblemSolver`` objects give the user access to this solver.
-
-So without much further ado, let us look at how one goes about using
-them.
+Ceres Solver besides being able to solve non-linear least squares
+problem can also solve general unconstrained problems using just their
+objective function value and gradients. In this chapter we will see
+how to do this.
 
 Rosenbrock's Function
 =====================
 
-We consider the minimization of the famous `Rosenbrock's function
+Consider minimizing the famous `Rosenbrock's function
 <http://en.wikipedia.org/wiki/Rosenbrock_function>`_ [#f1]_.
 
-We begin by defining an instance of the ``FirstOrderFunction``
-interface. This is the object that is responsible for computing the
-objective function value and the gradient (if required). This is the
-analog of the :class:`CostFunction` when defining non-linear least
-squares problems in Ceres.
+The simplest way to minimize is to define a templated functor to
+evaluate the objective value of this function and then use Ceres
+Solver's automatic differentiation to compute its derivatives.
+
+We begin by defining a templated functor and then using
+``AutoDiffFirstOrderFunction`` to construct an instance of the
+``FirstOrderFunction`` interface. This is the object that is
+responsible for computing the objective function value and the
+gradient (if required). This is the analog of the
+:class:`CostFunction` when defining non-linear least squares problems
+in Ceres.
 
 .. code::
 
-  class Rosenbrock : public ceres::FirstOrderFunction {
-   public:
-    virtual bool Evaluate(const double* parameters,
-                          double* cost,
-                          double* gradient) const {
-      const double x = parameters[0];
-      const double y = parameters[1];
-
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  struct Rosenbrock {
+    template <typename T>
+    bool operator()(const T* parameters, T* cost) const {
+      const T x = parameters[0];
+      const T y = parameters[1];
       cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
-      if (gradient != nullptr) {
-        gradient[0] = -2.0 * (1.0 - x) - 200.0 * (y - x * x) * 2.0 * x;
-        gradient[1] = 200.0 * (y - x * x);
-      }
       return true;
     }
 
-    virtual int NumParameters() const { return 2; }
+    static ceres::FirstOrderFunction* Create() {
+      constexpr int kNumParameters = 2;
+      return new ceres::AutoDiffFirstOrderFunction<Rosenbrock, kNumParameters>();
+    }
   };
 
 
@@ -58,7 +59,7 @@
 
     double parameters[2] = {-1.2, 1.0};
 
-    ceres::GradientProblem problem(new Rosenbrock());
+    ceres::GradientProblem problem(Rosenbrock::Create());
 
     ceres::GradientProblemSolver::Options options;
     options.minimizer_progress_to_stdout = true;
@@ -74,65 +75,130 @@
 
 .. code-block:: bash
 
-     0: f: 2.420000e+01 d: 0.00e+00 g: 2.16e+02 h: 0.00e+00 s: 0.00e+00 e:  0 it: 2.00e-05 tt: 2.00e-05
-     1: f: 4.280493e+00 d: 1.99e+01 g: 1.52e+01 h: 2.01e-01 s: 8.62e-04 e:  2 it: 7.32e-05 tt: 2.19e-04
-     2: f: 3.571154e+00 d: 7.09e-01 g: 1.35e+01 h: 3.78e-01 s: 1.34e-01 e:  3 it: 2.50e-05 tt: 2.68e-04
-     3: f: 3.440869e+00 d: 1.30e-01 g: 1.73e+01 h: 1.36e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 2.92e-04
-     4: f: 3.213597e+00 d: 2.27e-01 g: 1.55e+01 h: 1.06e-01 s: 4.59e-01 e:  1 it: 2.86e-06 tt: 3.14e-04
-     5: f: 2.839723e+00 d: 3.74e-01 g: 1.05e+01 h: 1.34e-01 s: 5.24e-01 e:  1 it: 2.86e-06 tt: 3.36e-04
-     6: f: 2.448490e+00 d: 3.91e-01 g: 1.29e+01 h: 3.04e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 3.58e-04
-     7: f: 1.943019e+00 d: 5.05e-01 g: 4.00e+00 h: 8.81e-02 s: 7.43e-01 e:  1 it: 4.05e-06 tt: 3.79e-04
-     8: f: 1.731469e+00 d: 2.12e-01 g: 7.36e+00 h: 1.71e-01 s: 4.60e-01 e:  2 it: 9.06e-06 tt: 4.06e-04
-     9: f: 1.503267e+00 d: 2.28e-01 g: 6.47e+00 h: 8.66e-02 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 4.33e-04
-    10: f: 1.228331e+00 d: 2.75e-01 g: 2.00e+00 h: 7.70e-02 s: 7.90e-01 e:  1 it: 3.81e-06 tt: 4.54e-04
-    11: f: 1.016523e+00 d: 2.12e-01 g: 5.15e+00 h: 1.39e-01 s: 3.76e-01 e:  2 it: 1.00e-05 tt: 4.82e-04
-    12: f: 9.145773e-01 d: 1.02e-01 g: 6.74e+00 h: 7.98e-02 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 5.03e-04
-    13: f: 7.508302e-01 d: 1.64e-01 g: 3.88e+00 h: 5.76e-02 s: 4.93e-01 e:  1 it: 2.86e-06 tt: 5.25e-04
-    14: f: 5.832378e-01 d: 1.68e-01 g: 5.56e+00 h: 1.42e-01 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 5.47e-04
-    15: f: 3.969581e-01 d: 1.86e-01 g: 1.64e+00 h: 1.17e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 5.68e-04
-    16: f: 3.171557e-01 d: 7.98e-02 g: 3.84e+00 h: 1.18e-01 s: 3.97e-01 e:  2 it: 9.06e-06 tt: 5.94e-04
-    17: f: 2.641257e-01 d: 5.30e-02 g: 3.27e+00 h: 6.14e-02 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 6.16e-04
-    18: f: 1.909730e-01 d: 7.32e-02 g: 5.29e-01 h: 8.55e-02 s: 6.82e-01 e:  1 it: 4.05e-06 tt: 6.42e-04
-    19: f: 1.472012e-01 d: 4.38e-02 g: 3.11e+00 h: 1.20e-01 s: 3.47e-01 e:  2 it: 1.00e-05 tt: 6.69e-04
-    20: f: 1.093558e-01 d: 3.78e-02 g: 2.97e+00 h: 8.43e-02 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 6.91e-04
-    21: f: 6.710346e-02 d: 4.23e-02 g: 1.42e+00 h: 9.64e-02 s: 8.85e-01 e:  1 it: 3.81e-06 tt: 7.12e-04
-    22: f: 3.993377e-02 d: 2.72e-02 g: 2.30e+00 h: 1.29e-01 s: 4.63e-01 e:  2 it: 9.06e-06 tt: 7.39e-04
-    23: f: 2.911794e-02 d: 1.08e-02 g: 2.55e+00 h: 6.55e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 7.62e-04
-    24: f: 1.457683e-02 d: 1.45e-02 g: 2.77e-01 h: 6.37e-02 s: 6.14e-01 e:  1 it: 3.81e-06 tt: 7.84e-04
-    25: f: 8.577515e-03 d: 6.00e-03 g: 2.86e+00 h: 1.40e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.05e-04
-    26: f: 3.486574e-03 d: 5.09e-03 g: 1.76e-01 h: 1.23e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.27e-04
-    27: f: 1.257570e-03 d: 2.23e-03 g: 1.39e-01 h: 5.08e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.48e-04
-    28: f: 2.783568e-04 d: 9.79e-04 g: 6.20e-01 h: 6.47e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.69e-04
-    29: f: 2.533399e-05 d: 2.53e-04 g: 1.68e-02 h: 1.98e-03 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 8.91e-04
-    30: f: 7.591572e-07 d: 2.46e-05 g: 5.40e-03 h: 9.27e-03 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 9.12e-04
-    31: f: 1.902460e-09 d: 7.57e-07 g: 1.62e-03 h: 1.89e-03 s: 1.00e+00 e:  1 it: 2.86e-06 tt: 9.33e-04
-    32: f: 1.003030e-12 d: 1.90e-09 g: 3.50e-05 h: 3.52e-05 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 9.54e-04
-    33: f: 4.835994e-17 d: 1.00e-12 g: 1.05e-07 h: 1.13e-06 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 9.81e-04
-    34: f: 1.885250e-22 d: 4.84e-17 g: 2.69e-10 h: 1.45e-08 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 1.00e-03
+       0: f: 2.420000e+01 d: 0.00e+00 g: 2.16e+02 h: 0.00e+00 s: 0.00e+00 e:  0 it: 1.19e-05 tt: 1.19e-05
+       1: f: 4.280493e+00 d: 1.99e+01 g: 1.52e+01 h: 2.01e-01 s: 8.62e-04 e:  2 it: 7.30e-05 tt: 1.72e-04
+       2: f: 3.571154e+00 d: 7.09e-01 g: 1.35e+01 h: 3.78e-01 s: 1.34e-01 e:  3 it: 1.60e-05 tt: 1.93e-04
+       3: f: 3.440869e+00 d: 1.30e-01 g: 1.73e+01 h: 1.36e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 1.97e-04
+       4: f: 3.213597e+00 d: 2.27e-01 g: 1.55e+01 h: 1.06e-01 s: 4.59e-01 e:  1 it: 1.19e-06 tt: 2.00e-04
+       5: f: 2.839723e+00 d: 3.74e-01 g: 1.05e+01 h: 1.34e-01 s: 5.24e-01 e:  1 it: 9.54e-07 tt: 2.03e-04
+       6: f: 2.448490e+00 d: 3.91e-01 g: 1.29e+01 h: 3.04e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.05e-04
+       7: f: 1.943019e+00 d: 5.05e-01 g: 4.00e+00 h: 8.81e-02 s: 7.43e-01 e:  1 it: 9.54e-07 tt: 2.08e-04
+       8: f: 1.731469e+00 d: 2.12e-01 g: 7.36e+00 h: 1.71e-01 s: 4.60e-01 e:  2 it: 2.15e-06 tt: 2.11e-04
+       9: f: 1.503267e+00 d: 2.28e-01 g: 6.47e+00 h: 8.66e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.14e-04
+      10: f: 1.228331e+00 d: 2.75e-01 g: 2.00e+00 h: 7.70e-02 s: 7.90e-01 e:  1 it: 0.00e+00 tt: 2.16e-04
+      11: f: 1.016523e+00 d: 2.12e-01 g: 5.15e+00 h: 1.39e-01 s: 3.76e-01 e:  2 it: 1.91e-06 tt: 2.25e-04
+      12: f: 9.145773e-01 d: 1.02e-01 g: 6.74e+00 h: 7.98e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.28e-04
+      13: f: 7.508302e-01 d: 1.64e-01 g: 3.88e+00 h: 5.76e-02 s: 4.93e-01 e:  1 it: 9.54e-07 tt: 2.30e-04
+      14: f: 5.832378e-01 d: 1.68e-01 g: 5.56e+00 h: 1.42e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.33e-04
+      15: f: 3.969581e-01 d: 1.86e-01 g: 1.64e+00 h: 1.17e-01 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 2.36e-04
+      16: f: 3.171557e-01 d: 7.98e-02 g: 3.84e+00 h: 1.18e-01 s: 3.97e-01 e:  2 it: 1.91e-06 tt: 2.39e-04
+      17: f: 2.641257e-01 d: 5.30e-02 g: 3.27e+00 h: 6.14e-02 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 2.42e-04
+      18: f: 1.909730e-01 d: 7.32e-02 g: 5.29e-01 h: 8.55e-02 s: 6.82e-01 e:  1 it: 9.54e-07 tt: 2.45e-04
+      19: f: 1.472012e-01 d: 4.38e-02 g: 3.11e+00 h: 1.20e-01 s: 3.47e-01 e:  2 it: 1.91e-06 tt: 2.49e-04
+      20: f: 1.093558e-01 d: 3.78e-02 g: 2.97e+00 h: 8.43e-02 s: 1.00e+00 e:  1 it: 2.15e-06 tt: 2.52e-04
+      21: f: 6.710346e-02 d: 4.23e-02 g: 1.42e+00 h: 9.64e-02 s: 8.85e-01 e:  1 it: 8.82e-06 tt: 2.81e-04
+      22: f: 3.993377e-02 d: 2.72e-02 g: 2.30e+00 h: 1.29e-01 s: 4.63e-01 e:  2 it: 7.87e-06 tt: 2.96e-04
+      23: f: 2.911794e-02 d: 1.08e-02 g: 2.55e+00 h: 6.55e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.00e-04
+      24: f: 1.457683e-02 d: 1.45e-02 g: 2.77e-01 h: 6.37e-02 s: 6.14e-01 e:  1 it: 1.19e-06 tt: 3.03e-04
+      25: f: 8.577515e-03 d: 6.00e-03 g: 2.86e+00 h: 1.40e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.06e-04
+      26: f: 3.486574e-03 d: 5.09e-03 g: 1.76e-01 h: 1.23e-02 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 3.09e-04
+      27: f: 1.257570e-03 d: 2.23e-03 g: 1.39e-01 h: 5.08e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.12e-04
+      28: f: 2.783568e-04 d: 9.79e-04 g: 6.20e-01 h: 6.47e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.15e-04
+      29: f: 2.533399e-05 d: 2.53e-04 g: 1.68e-02 h: 1.98e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.17e-04
+      30: f: 7.591572e-07 d: 2.46e-05 g: 5.40e-03 h: 9.27e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.20e-04
+      31: f: 1.902460e-09 d: 7.57e-07 g: 1.62e-03 h: 1.89e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.23e-04
+      32: f: 1.003030e-12 d: 1.90e-09 g: 3.50e-05 h: 3.52e-05 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.26e-04
+      33: f: 4.835994e-17 d: 1.00e-12 g: 1.05e-07 h: 1.13e-06 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 3.34e-04
+      34: f: 1.885250e-22 d: 4.84e-17 g: 2.69e-10 h: 1.45e-08 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.37e-04
 
-  Solver Summary (v 1.12.0-lapack-suitesparse-cxsparse-no_openmp)
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
 
-  Parameters                                  2
-  Line search direction              LBFGS (20)
-  Line search type                  CUBIC WOLFE
+    Parameters                                  2
+    Line search direction              LBFGS (20)
+    Line search type                  CUBIC WOLFE
 
 
-  Cost:
-  Initial                          2.420000e+01
-  Final                            1.885250e-22
-  Change                           2.420000e+01
+    Cost:
+    Initial                          2.420000e+01
+    Final                            1.955192e-27
+    Change                           2.420000e+01
 
-  Minimizer iterations                       35
+    Minimizer iterations                       36
 
-  Time (in seconds):
+    Time (in seconds):
 
-    Cost evaluation                       0.000
-    Gradient evaluation                   0.000
-  Total                                   0.003
+      Cost evaluation                    0.000000 (0)
+      Gradient & cost evaluation         0.000000 (44)
+      Polynomial minimization            0.000061
+    Total                                0.000438
 
-  Termination:                      CONVERGENCE (Gradient tolerance reached. Gradient max norm: 9.032775e-13 <= 1.000000e-10)
+    Termination:                      CONVERGENCE (Parameter tolerance reached. Relative step_norm: 1.890726e-11 <= 1.000000e-08.)
+
+    Initial x: -1.2 y: 1
+    Final   x: 1 y: 1
+
+
+
+
+If you are unable to use automatic differentiation for some reason
+(say because you need to call an external library), then you can
+use numeric differentiation. In that case the functor is defined as
+follows [#f2]_.
+
+.. code::
+
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  struct Rosenbrock {
+    bool operator()(const double* parameters, double* cost) const {
+      const double x = parameters[0];
+      const double y = parameters[1];
+      cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
+      return true;
+    }
+
+    static ceres::FirstOrderFunction* Create() {
+      constexpr int kNumParameters = 2;
+      return new ceres::NumericDiffFirstOrderFunction<Rosenbrock,
+                                                      ceres::CENTRAL,
+                                                      kNumParameters>();
+    }
+  };
+
+And finally, if you would rather compute the derivatives by hand (say
+because the size of the parameter vector is too large to be
+automatically differentiated). Then you should define an instance of
+`FirstOrderFunction`, which is the analog of :class:`CostFunction` for
+non-linear least squares problems [#f3]_.
+
+.. code::
+
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  class Rosenbrock final  : public ceres::FirstOrderFunction {
+    public:
+      bool Evaluate(const double* parameters,
+                             double* cost,
+                             double* gradient) const override {
+         const double x = parameters[0];
+         const double y = parameters[1];
+
+         cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
+         if (gradient) {
+           gradient[0] = -2.0 * (1.0 - x) - 200.0 * (y - x * x) * 2.0 * x;
+           gradient[1] = 200.0 * (y - x * x);
+         }
+        return true;
+     }
+
+     int NumParameters() const override { return 2; }
+  };
 
 .. rubric:: Footnotes
 
 .. [#f1] `examples/rosenbrock.cc
    <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock.cc>`_
+
+.. [#f2] `examples/rosenbrock_numeric_diff.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock_numeric_diff.cc>`_
+
+.. [#f3] `examples/rosenbrock_analytic_diff.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock_analytic_diff.cc>`_
diff --git a/docs/source/index.rst b/docs/source/index.rst
index d72368f..497b12c 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -44,12 +44,15 @@
 
 If you use Ceres Solver for a publication, please cite it as::
 
-    @misc{ceres-solver,
-      author = "Sameer Agarwal and Keir Mierle and Others",
-      title = "Ceres Solver",
-      howpublished = "\url{http://ceres-solver.org}",
-    }
-
+  @software{Agarwal_Ceres_Solver_2022,
+    author = {Agarwal, Sameer and Mierle, Keir and The Ceres Solver Team},
+    title = {{Ceres Solver}},
+    license = {Apache-2.0},
+    url = {https://github.com/ceres-solver/ceres-solver},
+    version = {2.2},
+    year = {2023},
+    month = {10}
+  }
 
 .. rubric:: Footnotes
 
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
index 7f49783..4feb1a4 100644
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
@@ -9,7 +9,7 @@
 .. _section-source:
 
 You can start with the `latest stable release
-<http://ceres-solver.org/ceres-solver-2.0.0.tar.gz>`_ . Or if you want
+<http://ceres-solver.org/ceres-solver-2.2.0.tar.gz>`_ . Or if you want
 the latest version, you can clone the git repository
 
 .. code-block:: bash
@@ -21,15 +21,16 @@
 Dependencies
 ============
 
-  .. NOTE ::
+ .. note ::
 
-    Starting with v2.0 Ceres requires a **fully C++14-compliant**
-    compiler.  In versions <= 1.14, C++11 was an optional requirement.
+    Ceres Solver 2.2 requires a **fully C++17-compliant** compiler.
 
 Ceres relies on a number of open source libraries, some of which are
 optional. For details on customizing the build process, see
 :ref:`section-customizing` .
 
+- `CMake <http://www.cmake.org>`_ 3.16 or later **required**.
+
 - `Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
   3.3 or later **required**.
 
@@ -39,9 +40,7 @@
     library. Please see the documentation for ``EIGENSPARSE`` for
     more details.
 
-- `CMake <http://www.cmake.org>`_ 3.5 or later **required**.
-
-- `glog <https://github.com/google/glog>`_ 0.3.1 or
+- `glog <https://github.com/google/glog>`_ 0.3.5 or
   later. **Recommended**
 
   ``glog`` is used extensively throughout Ceres for logging detailed
@@ -65,22 +64,13 @@
   recommend against it. ``miniglog`` has worse performance than
   ``glog`` and is much harder to control and use.
 
-  .. NOTE ::
-
-     If you are compiling ``glog`` from source, please note that
-     currently, the unit tests for ``glog`` (which are enabled by
-     default) do not compile against a default build of ``gflags`` 2.1
-     as the gflags namespace changed from ``google::`` to
-     ``gflags::``.  A patch to fix this is available from `here
-     <https://code.google.com/p/google-glog/issues/detail?id=194>`_.
-
 - `gflags <https://github.com/gflags/gflags>`_. Needed to build
   examples and tests and usually a dependency for glog.
 
-- `SuiteSparse
-  <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_. Needed for
-  solving large sparse linear systems. **Optional; strongly recomended
-  for large scale bundle adjustment**
+- `SuiteSparse <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_
+  4.5.6 or later. Needed for solving large sparse linear
+  systems. **Optional; strongly recommended for large scale bundle
+  adjustment**
 
   .. NOTE ::
 
@@ -90,10 +80,10 @@
      found TBB version. You can customize the searched TBB location
      with the ``TBB_ROOT`` variable.
 
-- `CXSparse <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_.
-  Similar to ``SuiteSparse`` but simpler and slower. CXSparse has
-  no dependencies on ``LAPACK`` and ``BLAS``. This makes for a simpler
-  build process and a smaller binary. **Optional**
+  A CMake native version of SuiteSparse that can be compiled on a variety of
+  platforms (e.g., using Visual Studio, Xcode, MinGW, etc.) is maintained by the
+  `CMake support for SuiteSparse <https://github.com/sergiud/SuiteSparse>`_
+  project.
 
 - `Apple's Accelerate sparse solvers <https://developer.apple.com/documentation/accelerate/sparse_solvers>`_.
   As of Xcode 9.0, Apple's Accelerate framework includes support for
@@ -104,9 +94,13 @@
   ``SuiteSparse``, and optionally used by Ceres directly for some
   operations.
 
-  On ``UNIX`` OSes other than macOS we recommend `ATLAS
+  For best performance on ``x86`` based Linux systems we recommend
+  using `Intel MKL
+  <https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html>`_.
+
+  Two other good options are `ATLAS
   <http://math-atlas.sourceforge.net/>`_, which includes ``BLAS`` and
-  ``LAPACK`` routines. It is also possible to use `OpenBLAS
+  ``LAPACK`` routines and `OpenBLAS
   <https://github.com/xianyi/OpenBLAS>`_ . However, one needs to be
   careful to `turn off the threading
   <https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-multi-threaded>`_
@@ -122,6 +116,15 @@
 
   **Optional but required for** ``SuiteSparse``.
 
+- `CUDA <https://developer.nvidia.com/cuda-toolkit>`_ If you have an
+  NVIDIA GPU then Ceres Solver can use it accelerate the solution of
+  the Gauss-Newton linear systems using the CMake flag ``USE_CUDA``.
+  Currently this support is limited to using the dense linear solvers that ship
+  with ``CUDA``. As a result GPU acceleration can be used to speed up
+  ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY`` and
+  ``DENSE_SCHUR``. This also enables ``CUDA`` mixed precision solves
+  for ``DENSE_NORMAL_CHOLESKY`` and ``DENSE_SCHUR``.  **Optional**.
+
 .. _section-linux:
 
 Linux
@@ -130,11 +133,12 @@
 We will use `Ubuntu <http://www.ubuntu.com>`_ as our example linux
 distribution.
 
-  .. NOTE ::
+.. NOTE::
 
-    These instructions are for Ubuntu 18.04 and newer. On Ubuntu 16.04
-    you need to manually get a more recent version of Eigen, such as
-    3.3.7.
+   Ceres Solver always supports the previous and current Ubuntu LTS
+   releases, currently 18.04 and 20.04, using the default Ubuntu
+   repositories and compiler toolchain. Support for earlier versions
+   is not guaranteed or maintained.
 
 Start by installing all the dependencies.
 
@@ -144,21 +148,21 @@
      sudo apt-get install cmake
      # google-glog + gflags
      sudo apt-get install libgoogle-glog-dev libgflags-dev
-     # BLAS & LAPACK
+     # Use ATLAS for BLAS & LAPACK
      sudo apt-get install libatlas-base-dev
      # Eigen3
      sudo apt-get install libeigen3-dev
-     # SuiteSparse and CXSparse (optional)
+     # SuiteSparse (optional)
      sudo apt-get install libsuitesparse-dev
 
 We are now ready to build, test, and install Ceres.
 
 .. code-block:: bash
 
- tar zxf ceres-solver-2.0.0.tar.gz
+ tar zxf ceres-solver-2.2.0.tar.gz
  mkdir ceres-bin
  cd ceres-bin
- cmake ../ceres-solver-2.0.0
+ cmake ../ceres-solver-2.2.0
  make -j3
  make test
  # Optionally install Ceres, it can also be exported using CMake which
@@ -172,7 +176,7 @@
 
 .. code-block:: bash
 
- bin/simple_bundle_adjuster ../ceres-solver-2.0.0/data/problem-16-22106-pre.txt
+ bin/simple_bundle_adjuster ../ceres-solver-2.2.0/data/problem-16-22106-pre.txt
 
 This runs Ceres for a maximum of 10 iterations using the
 ``DENSE_SCHUR`` linear solver. The output should look something like
@@ -181,63 +185,63 @@
 .. code-block:: bash
 
     iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
-       0  4.185660e+06    0.00e+00    1.09e+08   0.00e+00   0.00e+00  1.00e+04       0    7.59e-02    3.37e-01
-       1  1.062590e+05    4.08e+06    8.99e+06   5.36e+02   9.82e-01  3.00e+04       1    1.65e-01    5.03e-01
-       2  4.992817e+04    5.63e+04    8.32e+06   3.19e+02   6.52e-01  3.09e+04       1    1.45e-01    6.48e-01
-       3  1.899774e+04    3.09e+04    1.60e+06   1.24e+02   9.77e-01  9.26e+04       1    1.43e-01    7.92e-01
-       4  1.808729e+04    9.10e+02    3.97e+05   6.39e+01   9.51e-01  2.78e+05       1    1.45e-01    9.36e-01
-       5  1.803399e+04    5.33e+01    1.48e+04   1.23e+01   9.99e-01  8.33e+05       1    1.45e-01    1.08e+00
-       6  1.803390e+04    9.02e-02    6.35e+01   8.00e-01   1.00e+00  2.50e+06       1    1.50e-01    1.23e+00
+       0  4.185660e+06    0.00e+00    1.09e+08   0.00e+00   0.00e+00  1.00e+04        0    2.18e-02    6.57e-02
+       1  1.062590e+05    4.08e+06    8.99e+06   0.00e+00   9.82e-01  3.00e+04        1    5.07e-02    1.16e-01
+       2  4.992817e+04    5.63e+04    8.32e+06   3.19e+02   6.52e-01  3.09e+04        1    4.75e-02    1.64e-01
+       3  1.899774e+04    3.09e+04    1.60e+06   1.24e+02   9.77e-01  9.26e+04        1    4.74e-02    2.11e-01
+       4  1.808729e+04    9.10e+02    3.97e+05   6.39e+01   9.51e-01  2.78e+05        1    4.75e-02    2.59e-01
+       5  1.803399e+04    5.33e+01    1.48e+04   1.23e+01   9.99e-01  8.33e+05        1    4.74e-02    3.06e-01
+       6  1.803390e+04    9.02e-02    6.35e+01   8.00e-01   1.00e+00  2.50e+06        1    4.76e-02    3.54e-01
 
-    Ceres Solver v2.0.0 Solve Report
-    ----------------------------------
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
+
                                          Original                  Reduced
     Parameter blocks                        22122                    22122
     Parameters                              66462                    66462
     Residual blocks                         83718                    83718
-    Residual                               167436                   167436
+    Residuals                              167436                   167436
 
     Minimizer                        TRUST_REGION
 
     Dense linear algebra library            EIGEN
     Trust region strategy     LEVENBERG_MARQUARDT
-
                                             Given                     Used
     Linear solver                     DENSE_SCHUR              DENSE_SCHUR
     Threads                                     1                        1
-    Linear solver threads                       1                        1
-    Linear solver ordering              AUTOMATIC                22106, 16
+    Linear solver ordering              AUTOMATIC                 22106,16
+    Schur structure                         2,3,9                    2,3,9
 
     Cost:
     Initial                          4.185660e+06
     Final                            1.803390e+04
     Change                           4.167626e+06
 
-    Minimizer iterations                        6
-    Successful steps                            6
+    Minimizer iterations                        7
+    Successful steps                            7
     Unsuccessful steps                          0
 
     Time (in seconds):
-    Preprocessor                            0.261
+    Preprocessor                         0.043895
 
-      Residual evaluation                   0.082
-      Jacobian evaluation                   0.412
-      Linear solver                         0.442
-    Minimizer                               1.051
+      Residual only evaluation           0.029855 (7)
+      Jacobian & residual evaluation     0.120581 (7)
+      Linear solver                      0.153665 (7)
+    Minimizer                            0.339275
 
-    Postprocessor                           0.002
-    Total                                   1.357
+    Postprocessor                        0.000540
+    Total                                0.383710
 
-    Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.769766e-09 <= 1.000000e-06)
+    Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.769759e-09 <= 1.000000e-06)
+
 
 .. section-macos:
 
 macOS
 =====
 
-On macOS, you can either use `Homebrew
-<https://brew.sh/>`_ (recommended) or `MacPorts
-<https://www.macports.org/>`_ to install Ceres Solver.
+On macOS, you can either use `Homebrew <https://brew.sh/>`_
+(recommended) or `MacPorts <https://www.macports.org/>`_ to install
+Ceres Solver.
 
 If using `Homebrew <https://brew.sh/>`_, then
 
@@ -277,17 +281,17 @@
       brew install glog gflags
       # Eigen3
       brew install eigen
-      # SuiteSparse and CXSparse
+      # SuiteSparse
       brew install suite-sparse
 
 We are now ready to build, test, and install Ceres.
 
 .. code-block:: bash
 
-   tar zxf ceres-solver-2.0.0.tar.gz
+   tar zxf ceres-solver-2.2.0.tar.gz
    mkdir ceres-bin
    cd ceres-bin
-   cmake ../ceres-solver-2.0.0
+   cmake ../ceres-solver-2.2.0
    make -j3
    make test
    # Optionally install Ceres, it can also be exported using CMake which
@@ -295,53 +299,59 @@
    # documentation for the EXPORT_BUILD_DIR option for more information.
    make install
 
-Building with OpenMP on macOS
------------------------------
-
-Up to at least Xcode 12, OpenMP support was disabled in Apple's version of
-Clang.  However, you can install the latest version of the LLVM toolchain
-from Homebrew which does support OpenMP, and thus build Ceres with OpenMP
-support on macOS.  To do this, you must install llvm via Homebrew:
-
-.. code-block:: bash
-
-      # Install latest version of LLVM toolchain.
-      brew install llvm
-
-As the LLVM formula in Homebrew is keg-only, it will not be installed to
-``/usr/local`` to avoid conflicts with the standard Apple LLVM toolchain.
-To build Ceres with the Homebrew LLVM toolchain you should do the
-following:
-
-.. code-block:: bash
-
-   tar zxf ceres-solver-2.0.0.tar.gz
-   mkdir ceres-bin
-   cd ceres-bin
-   # Configure the local shell only (not persistent) to use the Homebrew LLVM
-   # toolchain in favour of the default Apple version.  This is taken
-   # verbatim from the instructions output by Homebrew when installing the
-   # llvm formula.
-   export LDFLAGS="-L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib"
-   export CPPFLAGS="-I/usr/local/opt/llvm/include"
-   export PATH="/usr/local/opt/llvm/bin:$PATH"
-   # Force CMake to use the Homebrew version of Clang and enable OpenMP.
-   cmake -DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++ -DCERES_THREADING_MODEL=OPENMP ../ceres-solver-2.0.0
-   make -j3
-   make test
-   # Optionally install Ceres.  It can also be exported using CMake which
-   # allows Ceres to be used without requiring installation.  See the
-   # documentation for the EXPORT_BUILD_DIR option for more information.
-   make install
-
-Like the Linux build, you should now be able to run
-``bin/simple_bundle_adjuster``.
-
 .. _section-windows:
 
 Windows
 =======
 
+Using a Library Manager
+-----------------------
+
+`vcpkg <https://github.com/microsoft/vcpkg>`_ is a library manager for Microsoft
+Windows that can be used to install Ceres Solver and all its dependencies.
+
+#. Install the library manager into a top-level directory ``vcpkg/`` on Windows
+   following the `guide
+   <https://github.com/microsoft/vcpkg#quick-start-windows>`_, e.g., using
+   Visual Studio 2022 community edition, or simply run
+
+    .. code:: bat
+
+        git clone https://github.com/Microsoft/vcpkg.git
+        cd vcpkg
+        .\bootstrap-vcpkg.bat
+        .\vcpkg integrate install
+
+#. Use vcpkg to install and build Ceres and all its dependencies, e.g., for 64
+   bit Windows
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe install ceres:x64-windows
+
+   Or with optional components, e.g., SuiteSparse, using
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe install ceres[suitesparse]:x64-windows
+
+#. Integrate vcpkg packages with Visual Studio to allow it to automatically
+   find all the libraries installed by vcpkg.
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe integrate install
+
+#. To use Ceres in a CMake project, follow our :ref:`instructions
+   <section-using-ceres>`.
+
+
+Building from Source
+--------------------
+
+Ceres Solver can also be built from source. For this purpose, we support Visual
+Studio 2019 and newer.
+
 .. NOTE::
 
   If you find the following CMake difficult to set up, then you may
@@ -349,36 +359,11 @@
   <https://github.com/tbennun/ceres-windows>`_ for Ceres Solver by Tal
   Ben-Nun.
 
-On Windows, we support building with Visual Studio 2015.2 of newer. Note
-that the Windows port is less featureful and less tested than the
-Linux or macOS versions due to the lack of an officially supported
-way of building SuiteSparse and CXSparse.  There are however a number
-of unofficial ways of building these libraries. Building on Windows
-also a bit more involved since there is no automated way to install
-dependencies.
+#. Create a top-level directory for dependencies, build, and sources somewhere,
+   e.g., ``ceres/``
 
-.. NOTE:: Using ``google-glog`` & ``miniglog`` with windows.h.
-
- The windows.h header if used with GDI (Graphics Device Interface)
- defines ``ERROR``, which conflicts with the definition of ``ERROR``
- as a LogSeverity level in ``google-glog`` and ``miniglog``.  There
- are at least two possible fixes to this problem:
-
- #. Use ``google-glog`` and define ``GLOG_NO_ABBREVIATED_SEVERITIES``
-    when building Ceres and your own project, as documented `here
-    <http://google-glog.googlecode.com/svn/trunk/doc/glog.html>`__.
-    Note that this fix will not work for ``miniglog``, but use of
-    ``miniglog`` is strongly discouraged on any platform for which
-    ``google-glog`` is available (which includes Windows).
- #. If you do not require GDI, then define ``NOGDI`` **before**
-    including windows.h.  This solution should work for both
-    ``google-glog`` and ``miniglog`` and is documented for
-    ``google-glog`` `here
-    <https://code.google.com/p/google-glog/issues/detail?id=33>`__.
-
-#. Make a toplevel directory for deps & build & src somewhere: ``ceres/``
 #. Get dependencies; unpack them as subdirectories in ``ceres/``
-   (``ceres/eigen``, ``ceres/glog``, etc)
+   (``ceres/eigen``, ``ceres/glog``, etc.)
 
    #. ``Eigen`` 3.3 . Configure and optionally install Eigen. It should be
       exported into the CMake package registry by default as part of the
@@ -394,45 +379,55 @@
       project.  If you wish to use ``SuiteSparse``, follow their
       instructions for obtaining and building it.
 
-   #. (Experimental) ``CXSparse`` Previously CXSparse was not
-      available on Windows, there are now several ports that enable it
-      to be, including: `[1] <https://github.com/PetterS/CXSparse>`_
-      and `[2] <https://github.com/TheFrenchLeaf/CXSparse>`_.  If you
-      wish to use ``CXSparse``, follow their instructions for
-      obtaining and building it.
+      Alternatively, Ceres Solver supports ``SuiteSparse`` binary
+      packages available for Visual Studio 2019 and 2022 provided by
+      the `CMake support for SuiteSparse
+      <https://github.com/sergiud/SuiteSparse>`_ project that also
+      include `reference LAPACK <http://www.netlib.org/blas>`_ (and
+      BLAS). The binary packages are used by Ceres Solver for
+      continuous testing on Github.
 
 #. Unpack the Ceres tarball into ``ceres``. For the tarball, you
    should get a directory inside ``ceres`` similar to
-   ``ceres-solver-2.0.0``. Alternately, checkout Ceres via ``git`` to
+   ``ceres-solver-2.2.0``. Alternately, checkout Ceres via ``git`` to
    get ``ceres-solver.git`` inside ``ceres``.
 
 #. Install ``CMake``,
 
-#. Make a dir ``ceres/ceres-bin`` (for an out-of-tree build)
+#. Create a directory ``ceres/ceres-bin`` (for an out-of-tree build)
+
+   #. If you use the above binary ``SuiteSparse`` package, make sure CMake can
+      find it, e.g., by assigning the path of the directory that contains the
+      unzipped contents to the ``CMAKE_PREFIX_PATH`` environment variable. In a
+      Windows command prompt this can be achieved as follows:
+
+      .. code:: bat
+
+        export CMAKE_PREFIX_PATH=C:/Downloads/SuiteSparse-5.11.0-cmake.1-vc16-Win64-Release-shared-gpl
 
 #. Run ``CMake``; select the ``ceres-solver-X.Y.Z`` or
    ``ceres-solver.git`` directory for the CMake file. Then select the
-   ``ceres-bin`` for the build dir.
+   ``ceres-bin`` for the build directory.
 
-#. Try running ``Configure``. It won't work. It'll show a bunch of options.
-   You'll need to set:
+#. Try running ``Configure`` which can fail at first because some dependencies
+   cannot be automatically located. In this case, you must set the following
+   CMake variables to the appropriate directories where you unpacked/built them:
 
    #. ``Eigen3_DIR`` (Set to directory containing ``Eigen3Config.cmake``)
    #. ``GLOG_INCLUDE_DIR_HINTS``
    #. ``GLOG_LIBRARY_DIR_HINTS``
    #. (Optional) ``gflags_DIR`` (Set to directory containing ``gflags-config.cmake``)
-   #. (Optional) ``SUITESPARSE_INCLUDE_DIR_HINTS``
-   #. (Optional) ``SUITESPARSE_LIBRARY_DIR_HINTS``
-   #. (Optional) ``CXSPARSE_INCLUDE_DIR_HINTS``
-   #. (Optional) ``CXSPARSE_LIBRARY_DIR_HINTS``
+   #. (SuiteSparse binary package) ``BLAS_blas_LIBRARY`` and
+      ``LAPACK_lapack_LIBRARY`` CMake variables must be `explicitly set` to
+      ``<path>/lib/blas.lib`` and ``<path>/lib/lapack.lib``, respectively, both
+      located in the unzipped package directory ``<path>``.
 
-   to the appropriate directories where you unpacked/built them. If
-   any of the variables are not visible in the ``CMake`` GUI, create a
-   new entry for them.  We recommend using the
-   ``<NAME>_(INCLUDE/LIBRARY)_DIR_HINTS`` variables rather than
-   setting the ``<NAME>_INCLUDE_DIR`` & ``<NAME>_LIBRARY`` variables
-   directly to keep all of the validity checking, and to avoid having
-   to specify the library files manually.
+   If any of the variables are not visible in the ``CMake`` GUI, create a new
+   entry for them.  We recommend using the
+   ``<NAME>_(INCLUDE/LIBRARY)_DIR_HINTS`` variables rather than setting the
+   ``<NAME>_INCLUDE_DIR`` & ``<NAME>_LIBRARY`` variables directly to keep all of
+   the validity checking, and to avoid having to specify the library files
+   manually.
 
 #. You may have to tweak some more settings to generate a MSVC
    project.  After each adjustment, try pressing Configure & Generate
@@ -447,17 +442,15 @@
 Like the Linux build, you should now be able to run
 ``bin/simple_bundle_adjuster``.
 
-Notes:
+.. note::
 
-#. The default build is Debug; consider switching it to release mode.
-#. Currently ``system_test`` is not working properly.
-#. CMake puts the resulting test binaries in ``ceres-bin/examples/Debug``
-   by default.
-#. The solvers supported on Windows are ``DENSE_QR``, ``DENSE_SCHUR``,
-   ``CGNR``, and ``ITERATIVE_SCHUR``.
-#. We're looking for someone to work with upstream ``SuiteSparse`` to
-   port their build system to something sane like ``CMake``, and get a
-   fully supported Windows port.
+    #. The default build is ``Debug``; consider switching it to ``Release`` for
+       optimal performance.
+    #. CMake puts the resulting test binaries in ``ceres-bin/examples/Debug`` by
+       default.
+    #. Without a sparse linear algebra library, only a subset of
+       solvers is usable, namely: ``DENSE_QR``, ``DENSE_SCHUR``,
+       ``CGNR``, and ``ITERATIVE_SCHUR``.
 
 
 .. _section-android:
@@ -556,9 +549,7 @@
 The default CMake configuration builds a bare bones version of Ceres
 Solver that only depends on Eigen (``MINIGLOG`` is compiled into Ceres
 if it is used), this should be sufficient for solving small to
-moderate sized problems (No ``SPARSE_SCHUR``,
-``SPARSE_NORMAL_CHOLESKY`` linear solvers and no ``CLUSTER_JACOBI``
-and ``CLUSTER_TRIDIAGONAL`` preconditioners).
+moderate sized problems.
 
 If you decide to use ``LAPACK`` and ``BLAS``, then you also need to
 add ``Accelerate.framework`` to your Xcode project's linking
@@ -648,22 +639,14 @@
       terms.  Ceres requires some components that are only licensed under
       GPL/Commercial terms.
 
-#. ``CXSPARSE [Default: ON]``: By default, Ceres will link to
-   ``CXSparse`` if all its dependencies are present. Turn this ``OFF``
-   to build Ceres without ``CXSparse``.
-
-   .. NOTE::
-
-      CXSparse is licensed under the LGPL.
-
 #. ``ACCELERATESPARSE [Default: ON]``: By default, Ceres will link to
    Apple's Accelerate framework directly if a version of it is detected
    which supports solving sparse linear systems.  Note that on Apple OSs
    Accelerate usually also provides the BLAS/LAPACK implementations and
    so would be linked against irrespective of the value of ``ACCELERATESPARSE``.
 
-#. ``EIGENSPARSE [Default: ON]``: By default, Ceres will not use
-   Eigen's sparse Cholesky factorization.
+#. ``EIGENSPARSE [Default: ON]``: By default, Ceres will use Eigen's
+   sparse Cholesky factorization.
 
 #. ``GFLAGS [Default: ON]``: Turn this ``OFF`` to build Ceres without
    ``gflags``. This will also prevent some of the example code from
@@ -680,11 +663,6 @@
    gains in the ``SPARSE_SCHUR`` solver, you can disable some of the
    template specializations by turning this ``OFF``.
 
-#. ``CERES_THREADING_MODEL [Default: CXX_THREADS > OPENMP > NO_THREADS]``:
-   Multi-threading backend Ceres should be compiled with.  This will
-   automatically be set to only accept the available subset of threading
-   options in the CMake GUI.
-
 #. ``BUILD_SHARED_LIBS [Default: OFF]``: By default Ceres is built as
    a static library, turn this ``ON`` to instead build Ceres as a
    shared library.
@@ -701,8 +679,8 @@
 
 #. ``BUILD_DOCUMENTATION [Default: OFF]``: Use this to enable building
    the documentation, requires `Sphinx <http://sphinx-doc.org/>`_ and
-   the `sphinx-better-theme
-   <https://pypi.python.org/pypi/sphinx-better-theme>`_ package
+   the `sphinx-rtd-theme
+   <https://pypi.org/project/sphinx-rtd-theme/>`_ package
    available from the Python package index. In addition, ``make
    ceres_docs`` can be used to build only the documentation.
 
@@ -865,25 +843,18 @@
 
 #. ``SuiteSparse``: Ceres built with SuiteSparse (``SUITESPARSE=ON``).
 
-#. ``CXSparse``: Ceres built with CXSparse (``CXSPARSE=ON``).
-
 #. ``AccelerateSparse``: Ceres built with Apple's Accelerate sparse solvers (``ACCELERATESPARSE=ON``).
 
 #. ``EigenSparse``: Ceres built with Eigen's sparse Cholesky factorization
    (``EIGENSPARSE=ON``).
 
-#. ``SparseLinearAlgebraLibrary``: Ceres built with *at least one* sparse linear
-   algebra library.  This is equivalent to ``SuiteSparse`` **OR** ``CXSparse``
-   **OR** ``AccelerateSparse``  **OR** ``EigenSparse``.
+#. ``SparseLinearAlgebraLibrary``: Ceres built with *at least one*
+   sparse linear algebra library.  This is equivalent to
+   ``SuiteSparse`` **OR** ``AccelerateSparse`` **OR** ``EigenSparse``.
 
 #. ``SchurSpecializations``: Ceres built with Schur specializations
    (``SCHUR_SPECIALIZATIONS=ON``).
 
-#. ``OpenMP``: Ceres built with OpenMP (``CERES_THREADING_MODEL=OPENMP``).
-
-#. ``Multithreading``: Ceres built with *a* multithreading library.
-   This is equivalent to (``CERES_THREAD != NO_THREADS``).
-
 To specify one/multiple Ceres components use the ``COMPONENTS`` argument to
 `find_package()
 <http://www.cmake.org/cmake/help/v3.5/command/find_package.html>`_ like so:
diff --git a/docs/source/interfacing_with_autodiff.rst b/docs/source/interfacing_with_autodiff.rst
index 02f58b2..fa05835 100644
--- a/docs/source/interfacing_with_autodiff.rst
+++ b/docs/source/interfacing_with_autodiff.rst
@@ -37,7 +37,7 @@
    template <typename T> T TemplatedComputeDistortion(const T r2) {
      const double k1 = 0.0082;
      const double k2 = 0.000023;
-     return 1.0 + k1 * y2 + k2 * r2 * r2;
+     return 1.0 + k1 * r2 + k2 * r2 * r2;
    }
 
    struct Affine2DWithDistortion {
@@ -118,12 +118,13 @@
        y[0] = y_in[0];
        y[1] = y_in[1];
 
-       compute_distortion.reset(new ceres::CostFunctionToFunctor<1, 1>(
-            new ceres::NumericDiffCostFunction<ComputeDistortionValueFunctor,
-                                               ceres::CENTRAL,
-                                               1,
-                                               1>(
-               new ComputeDistortionValueFunctor)));
+       compute_distortion = std::make_unique<ceres::CostFunctionToFunctor<1, 1>>(
+         std::make_unique<ceres::NumericDiffCostFunction<
+               ComputeDistortionValueFunctor
+             , ceres::CENTRAL, 1, 1
+           >
+         >()
+       );
      }
 
      template <typename T>
@@ -140,7 +141,7 @@
 
      double x[2];
      double y[2];
-     std::unique_ptr<ceres::CostFunctionToFunctor<1, 1> > compute_distortion;
+     std::unique_ptr<ceres::CostFunctionToFunctor<1, 1>> compute_distortion;
    };
 
 
@@ -148,7 +149,7 @@
 ------------------------------------------------
 
 Now suppose we are given a function :code:`ComputeDistortionValue`
-thatis able to compute its value and optionally its Jacobian on demand
+that is able to compute its value and optionally its Jacobian on demand
 and has the following signature:
 
 .. code-block:: c++
diff --git a/docs/source/inverse_and_implicit_function_theorems.rst b/docs/source/inverse_and_implicit_function_theorems.rst
new file mode 100644
index 0000000..7d8f7fa
--- /dev/null
+++ b/docs/source/inverse_and_implicit_function_theorems.rst
@@ -0,0 +1,214 @@
+.. default-domain:: cpp
+
+.. cpp:namespace:: ceres
+
+.. _chapter-inverse_function_theorem:
+
+==========================================
+Using Inverse & Implicit Function Theorems
+==========================================
+
+Until now we have considered methods for computing derivatives that
+work directly on the function being differentiated. However, this is
+not always possible. For example, if the function can only be computed
+via an iterative algorithm, or there is no explicit definition of the
+function available.  In this section we will see how we can use two
+basic results from calculus to get around these difficulties.
+
+
+Inverse Function Theorem
+========================
+
+Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
+but evaluating :math:`f(x)` is not easy. Say it involves running an
+iterative algorithm. You could try automatically differentiating the
+iterative algorithm, but even if that is possible, it can become quite
+expensive.
+
+In some cases we get lucky, and computing the inverse of :math:`f(x)`
+is an easy operation. In these cases, we can use the `Inverse Function
+Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
+compute the derivative exactly. Here is the key idea:
+
+Assuming that :math:`y=f(x)` is continuously differentiable in a
+neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
+Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
+the identity :math:`f^{-1}(f(x)) = x`, we have
+:math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
+i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
+:math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.
+
+For example, let :math:`f(x) = e^x`. Now of course we know that
+:math:`Df(x) = e^x`, but let's try and compute it via the Inverse
+Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
+y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
+(Df^{-1}(y))^{-1} = y = e^x`.
+
+You maybe wondering why the above is true. A smoothly differentiable
+function in a small neighborhood is well approximated by a linear
+function. Indeed this is a good way to think about the Jacobian, it is
+the matrix that best approximates the function linearly. Once you do
+that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
+best approximated linearly by the inverse of the Jacobian of
+:math:`f(x)`.
+
+Let us now consider a more practical example.
+
+Geodetic Coordinate System Conversion
+-------------------------------------
+
+When working with data related to the Earth, one can use two different
+coordinate systems. The familiar (latitude, longitude, height)
+Latitude-Longitude-Altitude coordinate system or the `ECEF
+<http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
+is familiar but is not terribly convenient analytically. The latter is
+a Cartesian system but not particularly intuitive. So systems that
+process earth related data have to go back and forth between these
+coordinate systems.
+
+The conversion between the LLA and the ECEF coordinate system requires
+a model of the Earth, the most commonly used one being `WGS84
+<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
+
+Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
+:math:`(x,y,z)` coordinates is easy.
+
+.. math::
+
+   \chi &= \sqrt{1 - e^2 \sin^2 \phi}
+
+   X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda
+
+   Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda
+
+   Z &= \left(\frac{a(1-e^2)}{\chi}  +h \right) \sin \phi
+
+Here :math:`a` and :math:`e^2` are constants defined by `WGS84
+<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
+
+Going from ECEF to LLA coordinates requires an iterative algorithm. So
+to compute the derivative of the this transformation we invoke the
+Inverse Function Theorem as follows:
+
+.. code-block:: c++
+
+   Eigen::Vector3d ecef; // Fill some values
+   // Iterative computation.
+   Eigen::Vector3d lla = ECEFToLLA(ecef);
+   // Analytic derivatives
+   Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
+   bool invertible;
+   Eigen::Matrix3d ecef_to_lla_jacobian;
+   lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
+
+
+Implicit Function Theorem
+=========================
+
+Consider now the problem where we have two variables :math:`x \in
+\mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
+:math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
+such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
+:math:`y` with respect to `x`. How do we do this?
+
+If for a given value of :math:`(x,y)`, the partial Jacobian
+:math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
+<https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
+that there exists a neighborhood of :math:`x` and a function :math:`G`
+such :math:`y = G(x)` in this neighborhood. Differentiating
+:math:`F(x,G(x)) = 0` gives us
+
+.. math::
+
+   D_1F(x,y) + D_2F(x,y)DG(x) &= 0
+
+                        DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
+
+                        D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
+
+This means that we can compute the derivative of :math:`y` with
+respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
+:math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.
+
+Let's consider two examples.
+
+Roots of a Polynomial
+---------------------
+
+The first example we consider is a classic. Let :math:`p(x) = a_0 +
+a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
+to compute the derivative of its roots with respect to its
+coefficients. There is no closed form formula for computing the roots
+of a general degree :math:`n` polynomial. `Galois
+<https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
+<https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
+are numerical algorithms like computing the eigenvalues of the
+`Companion Matrix
+<https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
+differentiating an eigenvalue solver does not seem like fun. But the
+Implicit Function Theorem offers us a simple path.
+
+If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
+a_0 + a_1 x + \dots + a_n x^n = 0`. So,
+
+.. math::
+
+   D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]
+
+   D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)
+
+        Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]
+
+Differentiating the Solution to an Optimization Problem
+-------------------------------------------------------
+
+Sometimes we are required to solve optimization problems inside
+optimization problems, and this requires computing the derivative of
+the optimal solution (or a fixed point) of an optimization problem
+w.r.t its parameters.
+
+Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
+\mathbb{R}^{k\times n}` be a matrix whose entries are a function of
+:math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
+be a constant vector, then consider the linear least squares problem:
+
+.. math::
+
+   x^* = \arg \min_x \|A(\theta) x - b\|_2^2
+
+How do we compute :math:`D_\theta x^*(\theta)`?
+
+One approach would be to observe that :math:`x^*(\theta) =
+(A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
+this w.r.t :math:`\theta`. But this would require differentiating
+through the inverse of the matrix
+:math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
+the Implicit Function Theorem instead.
+
+The first step is to observe that :math:`x^*` satisfies the so called
+*normal equations*.
+
+.. math::
+
+   A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0
+
+We will compute :math:`D_\theta x^*` column-wise, treating
+:math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
+of :math:`\theta` at a time. So using the normal equations, let's
+define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
+A^\top(\theta_i)b = 0`. Using which can now compute:
+
+.. math::
+
+   D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
+   D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i
+
+   D_2F(\theta_i, x^*) &= A^\top A
+
+   Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i
+
+   Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]
+
+Observe that we only need to compute the inverse of :math:`A^\top A`,
+to compute :math:`D x^*(\theta)`, which we needed anyways to compute
+:math:`x^*`.
diff --git a/docs/source/license.rst b/docs/source/license.rst
index a3c55c9..ed85f6a 100644
--- a/docs/source/license.rst
+++ b/docs/source/license.rst
@@ -12,7 +12,7 @@
 
 Ceres Solver is licensed under the New BSD license, whose terms are as follows.
 
-Copyright 2016 Google Inc. All rights reserved.
+Copyright 2023 Google Inc. All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
diff --git a/docs/source/loss.png b/docs/source/loss.png
index 9f98d00..5c9ac07 100644
--- a/docs/source/loss.png
+++ b/docs/source/loss.png
Binary files differ
diff --git a/docs/source/modeling_faqs.rst b/docs/source/modeling_faqs.rst
index a0c8f2f..0d23de4 100644
--- a/docs/source/modeling_faqs.rst
+++ b/docs/source/modeling_faqs.rst
@@ -37,7 +37,7 @@
    automatic and numeric differentiation. See
    :class:`CostFunctionToFunctor`.
 
-#. When using Quaternions,  consider using :class:`QuaternionParameterization`.
+#. When using Quaternions,  consider using :class:`QuaternionManifold`.
 
    `Quaternions <https://en.wikipedia.org/wiki/Quaternion>`_ are a
    four dimensional parameterization of the space of three dimensional
@@ -47,14 +47,14 @@
    associate a local parameterization with parameter blocks
    representing a Quaternion. Assuming that the order of entries in
    your parameter block is :math:`w,x,y,z`, you can use
-   :class:`QuaternionParameterization`.
+   :class:`QuaternionManifold`.
 
    .. NOTE::
 
      If you are using `Eigen's Quaternion
      <http://eigen.tuxfamily.org/dox/classEigen_1_1Quaternion.html>`_
      object, whose layout is :math:`x,y,z,w`, then you should use
-     :class:`EigenQuaternionParameterization`.
+     :class:`EigenQuaternionManifold`.
 
 
 #. How do I solve problems with general linear & non-linear
@@ -85,50 +85,4 @@
 
 #. How do I set one or more components of a parameter block constant?
 
-   Using :class:`SubsetParameterization`.
-
-#. Putting `Inverse Function Theorem
-   <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to use.
-
-   Every now and then we have to deal with functions which cannot be
-   evaluated analytically. Computing the Jacobian in such cases is
-   tricky. A particularly interesting case is where the inverse of the
-   function is easy to compute analytically. An example of such a
-   function is the Coordinate transformation between the `ECEF
-   <http://en.wikipedia.org/wiki/ECEF>`_ and the `WGS84
-   <http://en.wikipedia.org/wiki/World_Geodetic_System>`_ where the
-   conversion from WGS84 to ECEF is analytic, but the conversion
-   back to WGS84 uses an iterative algorithm. So how do you compute the
-   derivative of the ECEF to WGS84 transformation?
-
-   One obvious approach would be to numerically
-   differentiate the conversion function. This is not a good idea. For
-   one, it will be slow, but it will also be numerically quite
-   bad.
-
-   Turns out you can use the `Inverse Function Theorem
-   <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ in this
-   case to compute the derivatives more or less analytically.
-
-   The key result here is. If :math:`x = f^{-1}(y)`, and :math:`Df(x)`
-   is the invertible Jacobian of :math:`f` at :math:`x`. Then the
-   Jacobian :math:`Df^{-1}(y) = [Df(x)]^{-1}`, i.e., the Jacobian of
-   the :math:`f^{-1}` is the inverse of the Jacobian of :math:`f`.
-
-   Algorithmically this means that given :math:`y`, compute :math:`x =
-   f^{-1}(y)` by whatever means you can. Evaluate the Jacobian of
-   :math:`f` at :math:`x`. If the Jacobian matrix is invertible, then
-   its inverse is the Jacobian of :math:`f^{-1}(y)` at  :math:`y`.
-
-   One can put this into practice with the following code fragment.
-
-   .. code-block:: c++
-
-      Eigen::Vector3d ecef; // Fill some values
-      // Iterative computation.
-      Eigen::Vector3d lla = ECEFToLLA(ecef);
-      // Analytic derivatives
-      Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
-      bool invertible;
-      Eigen::Matrix3d ecef_to_lla_jacobian;
-      lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
+   Using :class:`SubsetManifold`.
diff --git a/docs/source/nnls_covariance.rst b/docs/source/nnls_covariance.rst
index 66afd44..f95d246 100644
--- a/docs/source/nnls_covariance.rst
+++ b/docs/source/nnls_covariance.rst
@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -115,7 +116,7 @@
      four dimensional quaternion used to parameterize :math:`SO(3)`,
      which is a three dimensional manifold. In cases like this, the
      user should use an appropriate
-     :class:`LocalParameterization`. Not only will this lead to better
+     :class:`Manifold`. Not only will this lead to better
      numerical behaviour of the Solver, it will also expose the rank
      deficiency to the :class:`Covariance` object so that it can
      handle it correctly.
@@ -166,7 +167,7 @@
       moderately fast algorithm suitable for small to medium sized
       matrices. For best performance we recommend using
       ``SuiteSparseQR`` which is enabled by setting
-      :member:`Covaraince::Options::sparse_linear_algebra_library_type`
+      :member:`Covariance::Options::sparse_linear_algebra_library_type`
       to ``SUITE_SPARSE``.
 
       ``SPARSE_QR`` cannot compute the covariance if the
@@ -187,6 +188,23 @@
       well as rank deficient Jacobians.
 
 
+.. member:: double Covariance::Options::column_pivot_threshold
+
+   Default: :math:`-1`
+
+    During QR factorization, if a column with Euclidean norm less than
+    ``column_pivot_threshold`` is encountered it is treated as zero.
+
+    If ``column_pivot_threshold < 0``, then an automatic default value
+    of `20*(m+n)*eps*sqrt(max(diag(J’*J)))` is used.  Here `m` and `n`
+    are the number of rows and columns of the Jacobian (`J`)
+    respectively.
+
+    This is an advanced option meant for users who know enough about
+    their Jacobian matrices that they can determine a value better
+    than the default.
+
+
 .. member:: int Covariance::Options::min_reciprocal_condition_number
 
    Default: :math:`10^{-14}`
@@ -221,7 +239,7 @@
       .. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}}  < \sqrt{\text{min_reciprocal_condition_number}}
 
       where :math:`\sigma_{\text{min}}` and
-      :math:`\sigma_{\text{max}}` are the minimum and maxiumum
+      :math:`\sigma_{\text{max}}` are the minimum and maximum
       singular values of :math:`J` respectively.
 
    2. ``SPARSE_QR``
@@ -285,7 +303,7 @@
    entire documentation for :class:`Covariance::Options` before using
    :class:`Covariance`.
 
-.. function:: bool Covariance::Compute(const vector<pair<const double*, const double*> >& covariance_blocks, Problem* problem)
+.. function:: bool Covariance::Compute(const std::vector<std::pair<const double*, const double*> >& covariance_blocks, Problem* problem)
 
    Compute a part of the covariance matrix.
 
@@ -361,7 +379,7 @@
  Covariance::Options options;
  Covariance covariance(options);
 
- vector<pair<const double*, const double*> > covariance_blocks;
+ std::vector<std::pair<const double*, const double*> > covariance_blocks;
  covariance_blocks.push_back(make_pair(x, x));
  covariance_blocks.push_back(make_pair(y, y));
  covariance_blocks.push_back(make_pair(x, y));
diff --git a/docs/source/nnls_modeling.rst b/docs/source/nnls_modeling.rst
index c0c3227..be87149 100644
--- a/docs/source/nnls_modeling.rst
+++ b/docs/source/nnls_modeling.rst
@@ -1,3 +1,5 @@
+.. highlight:: c++
+
 .. default-domain:: cpp
 
 .. cpp:namespace:: ceres
@@ -50,7 +52,7 @@
 
 As a special case, when :math:`\rho_i(x) = x`, i.e., the identity
 function, and :math:`l_j = -\infty` and :math:`u_j = \infty` we get
-the more familiar unconstrained `non-linear least squares problem
+the usual unconstrained `non-linear least squares problem
 <http://en.wikipedia.org/wiki/Non-linear_least_squares>`_.
 
 .. math:: :label: ceresproblemunconstrained
@@ -80,12 +82,12 @@
      public:
       virtual bool Evaluate(double const* const* parameters,
                             double* residuals,
-                            double** jacobians) = 0;
-      const vector<int32>& parameter_block_sizes();
+                            double** jacobians) const = 0;
+      const std::vector<int32>& parameter_block_sizes();
       int num_residuals() const;
 
      protected:
-      vector<int32>* mutable_parameter_block_sizes();
+      std::vector<int32>* mutable_parameter_block_sizes();
       void set_num_residuals(int num_residuals);
     };
 
@@ -98,7 +100,7 @@
 the corresponding accessors. This information will be verified by the
 :class:`Problem` when added with :func:`Problem::AddResidualBlock`.
 
-.. function:: bool CostFunction::Evaluate(double const* const* parameters, double* residuals, double** jacobians)
+.. function:: bool CostFunction::Evaluate(double const* const* parameters, double* residuals, double** jacobians) const
 
    Compute the residual vector and the Jacobian matrices.
 
@@ -179,12 +181,19 @@
      class AutoDiffCostFunction : public
      SizedCostFunction<kNumResiduals, Ns> {
       public:
-       AutoDiffCostFunction(CostFunctor* functor, ownership = TAKE_OWNERSHIP);
+       // Instantiate CostFunctor using the supplied arguments.
+       template<class ...Args>
+       explicit AutoDiffCostFunction(Args&& ...args);
+       explicit AutoDiffCostFunction(std::unique_ptr<CostFunctor> functor);
+       explicit AutoDiffCostFunction(CostFunctor* functor, ownership = TAKE_OWNERSHIP);
+
        // Ignore the template parameter kNumResiduals and use
        // num_residuals instead.
        AutoDiffCostFunction(CostFunctor* functor,
                             int num_residuals,
                             ownership = TAKE_OWNERSHIP);
+       AutoDiffCostFunction(std::unique_ptr<CostFunctor> functor,
+                            int num_residuals);
      };
 
    To get an auto differentiated cost function, you must define a
@@ -242,9 +251,9 @@
 
    .. code-block:: c++
 
-    CostFunction* cost_function
-        = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(
-            new MyScalarCostFunctor(1.0));              ^  ^  ^
+    auto* cost_function
+        = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(1.0);
+                                                        ^  ^  ^
                                                         |  |  |
                             Dimension of residual ------+  |  |
                             Dimension of x ----------------+  |
@@ -270,7 +279,7 @@
    .. code-block:: c++
 
     MyScalarCostFunctor functor(1.0)
-    CostFunction* cost_function
+    auto* cost_function
         = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(
             &functor, DO_NOT_TAKE_OWNERSHIP);
 
@@ -279,9 +288,11 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new AutoDiffCostFunction<MyScalarCostFunctor, DYNAMIC, 2, 2>(
-             new CostFunctorWithDynamicNumResiduals(1.0),   ^     ^  ^
+     auto functor = std::make_unique<CostFunctorWithDynamicNumResiduals>(1.0);
+     auto* cost_function
+         = new AutoDiffCostFunction<CostFunctorWithDynamicNumResiduals,
+                                                         DYNAMIC, 2, 2>(
+             std::move(functor),                            ^     ^  ^
              runtime_number_of_residuals); <----+           |     |  |
                                                 |           |     |  |
                                                 |           |     |  |
@@ -290,13 +301,13 @@
                Dimension of x ------------------------------------+  |
                Dimension of y ---------------------------------------+
 
-   **WARNING 1** A common beginner's error when first using
-   :class:`AutoDiffCostFunction` is to get the sizing wrong. In particular,
-   there is a tendency to set the template parameters to (dimension of
-   residual, number of parameters) instead of passing a dimension
-   parameter for *every parameter block*. In the example above, that
-   would be ``<MyScalarCostFunction, 1, 2>``, which is missing the 2
-   as the last template argument.
+   .. warning::
+       A common beginner's error when first using :class:`AutoDiffCostFunction`
+       is to get the sizing wrong. In particular, there is a tendency to set the
+       template parameters to (dimension of residual, number of parameters)
+       instead of passing a dimension parameter for *every parameter block*. In
+       the example above, that would be ``<MyScalarCostFunction, 1, 2>``, which
+       is missing the 2 as the last template argument.
 
 
 :class:`DynamicAutoDiffCostFunction`
@@ -334,9 +345,7 @@
 
      .. code-block:: c++
 
-       DynamicAutoDiffCostFunction<MyCostFunctor, 4>* cost_function =
-         new DynamicAutoDiffCostFunction<MyCostFunctor, 4>(
-           new MyCostFunctor());
+       auto* cost_function = new DynamicAutoDiffCostFunction<MyCostFunctor, 4>();
        cost_function->AddParameterBlock(5);
        cost_function->AddParameterBlock(10);
        cost_function->SetNumResiduals(21);
@@ -441,9 +450,9 @@
 
   .. code-block:: c++
 
-    CostFunction* cost_function
-        = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, 1, 2, 2>(
-            new MyScalarCostFunctor(1.0));                    ^     ^  ^  ^
+    auto* cost_function
+        = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, 1, 2, 2>(1.0)
+                                                              ^     ^  ^  ^
                                                               |     |  |  |
                                   Finite Differencing Scheme -+     |  |  |
                                   Dimension of residual ------------+  |  |
@@ -463,17 +472,18 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, DYNAMIC, 2, 2>(
-             new CostFunctorWithDynamicNumResiduals(1.0),               ^     ^  ^
-             TAKE_OWNERSHIP,                                            |     |  |
-             runtime_number_of_residuals); <----+                       |     |  |
-                                                |                       |     |  |
-                                                |                       |     |  |
-               Actual number of residuals ------+                       |     |  |
-               Indicate dynamic number of residuals --------------------+     |  |
-               Dimension of x ------------------------------------------------+  |
-               Dimension of y ---------------------------------------------------+
+     auto functor = std::make_unique<CostFunctorWithDynamicNumResiduals>(1.0);
+     auto* cost_function
+         = new NumericDiffCostFunction<CostFunctorWithDynamicNumResiduals,
+                                                CENTRAL, DYNAMIC, 2, 2>(
+             std::move(functor),                            ^     ^  ^
+             runtime_number_of_residuals); <----+           |     |  |
+                                                |           |     |  |
+                                                |           |     |  |
+               Actual number of residuals ------+           |     |  |
+               Indicate dynamic number of residuals --------+     |  |
+               Dimension of x ------------------------------------+  |
+               Dimension of y ---------------------------------------+
 
 
   There are three available numeric differentiation schemes in ceres-solver:
@@ -502,18 +512,18 @@
   results, either try forward difference to improve performance or
   Ridders' method to improve accuracy.
 
-  **WARNING** A common beginner's error when first using
-  :class:`NumericDiffCostFunction` is to get the sizing wrong. In
-  particular, there is a tendency to set the template parameters to
-  (dimension of residual, number of parameters) instead of passing a
-  dimension parameter for *every parameter*. In the example above,
-  that would be ``<MyScalarCostFunctor, 1, 2>``, which is missing the
-  last ``2`` argument. Please be careful when setting the size
-  parameters.
+  .. warning::
+      A common beginner's error when first using
+      :class:`NumericDiffCostFunction` is to get the sizing wrong. In
+      particular, there is a tendency to set the template parameters to
+      (dimension of residual, number of parameters) instead of passing a
+      dimension parameter for *every parameter*. In the example above, that
+      would be ``<MyScalarCostFunctor, 1, 2>``, which is missing the last ``2``
+      argument. Please be careful when setting the size parameters.
 
 
-Numeric Differentiation & LocalParameterization
------------------------------------------------
+Numeric Differentiation & Manifolds
+-----------------------------------
 
    If your cost function depends on a parameter block that must lie on
    a manifold and the functor cannot be evaluated for values of that
@@ -522,11 +532,10 @@
 
    This is because numeric differentiation in Ceres is performed by
    perturbing the individual coordinates of the parameter blocks that
-   a cost functor depends on. In doing so, we assume that the
-   parameter blocks live in an Euclidean space and ignore the
-   structure of manifold that they live As a result some of the
-   perturbations may not lie on the manifold corresponding to the
-   parameter block.
+   a cost functor depends on. This perturbation assumes that the
+   parameter block lives on a Euclidean Manifold rather than the
+   actual manifold associated with the parameter block. As a result
+   some of the perturbed points may not lie on the manifold anymore.
 
    For example consider a four dimensional parameter block that is
    interpreted as a unit Quaternion. Perturbing the coordinates of
@@ -534,7 +543,7 @@
    parameter block.
 
    Fixing this problem requires that :class:`NumericDiffCostFunction`
-   be aware of the :class:`LocalParameterization` associated with each
+   be aware of the :class:`Manifold` associated with each
    parameter block and only generate perturbations in the local
    tangent space of each parameter block.
 
@@ -568,9 +577,8 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>(
-             new MyCostFunction(...), TAKE_OWNERSHIP);
+     auto* cost_function
+         = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>(...);
 
    where ``MyCostFunction`` has 1 residual and 2 parameter blocks with
    sizes 4 and 8 respectively. Look at the tests for a more detailed
@@ -611,8 +619,7 @@
 
      .. code-block:: c++
 
-       DynamicNumericDiffCostFunction<MyCostFunctor>* cost_function =
-         new DynamicNumericDiffCostFunction<MyCostFunctor>(new MyCostFunctor);
+       auto cost_function = std::make_unique<DynamicNumericDiffCostFunction<MyCostFunctor>>();
        cost_function->AddParameterBlock(5);
        cost_function->AddParameterBlock(10);
        cost_function->SetNumResiduals(21);
@@ -620,9 +627,9 @@
    As a rule of thumb, try using :class:`NumericDiffCostFunction` before
    you use :class:`DynamicNumericDiffCostFunction`.
 
-   **WARNING** The same caution about mixing local parameterizations
-   with numeric differentiation applies as is the case with
-   :class:`NumericDiffCostFunction`.
+   .. warning::
+       The same caution about mixing manifolds with numeric differentiation
+       applies as is the case with :class:`NumericDiffCostFunction`.
 
 :class:`CostFunctionToFunctor`
 ==============================
@@ -671,8 +678,8 @@
    .. code-block:: c++
 
     struct CameraProjection {
-      CameraProjection(double* observation)
-      : intrinsic_projection_(new IntrinsicProjection(observation)) {
+      explicit CameraProjection(double* observation)
+      : intrinsic_projection_(std::make_unique<IntrinsicProjection>(observation)) {
       }
 
       template <typename T>
@@ -690,7 +697,7 @@
       }
 
      private:
-      CostFunctionToFunctor<2,5,3> intrinsic_projection_;
+      CostFunctionToFunctor<2, 5, 3> intrinsic_projection_;
     };
 
    Note that :class:`CostFunctionToFunctor` takes ownership of the
@@ -732,10 +739,9 @@
   .. code-block:: c++
 
    struct CameraProjection {
-     CameraProjection(double* observation)
+     explicit CameraProjection(double* observation)
         : intrinsic_projection_(
-              new NumericDiffCostFunction<IntrinsicProjection, CENTRAL, 2, 5, 3>(
-                  new IntrinsicProjection(observation))) {}
+              std::make_unique<NumericDiffCostFunction<IntrinsicProjection, CENTRAL, 2, 5, 3>>()) {}
 
      template <typename T>
      bool operator()(const T* rotation,
@@ -793,8 +799,8 @@
    .. code-block:: c++
 
     struct CameraProjection {
-      CameraProjection(double* observation)
-          : intrinsic_projection_(new IntrinsicProjection(observation)) {
+      explicit CameraProjection(double* observation)
+          : intrinsic_projection_(std::make_unique<IntrinsicProjection>(observation)) {
       }
 
       template <typename T>
@@ -841,7 +847,7 @@
        //  my_cost_function produces N residuals
        CostFunction* my_cost_function = ...
        CHECK_EQ(N, my_cost_function->num_residuals());
-       vector<CostFunction*> conditioners;
+       std::vector<CostFunction*> conditioners;
 
        //  Make N 1x1 cost functions (1 parameter, 1 residual)
        CostFunction* f_1 = ...
@@ -864,38 +870,39 @@
 
 
 :class:`GradientChecker`
-================================
+========================
 
 .. class:: GradientChecker
 
-    This class compares the Jacobians returned by a cost function against
-    derivatives estimated using finite differencing. It is meant as a tool for
-    unit testing, giving you more fine-grained control than the check_gradients
-    option in the solver options.
+    This class compares the Jacobians returned by a cost function
+    against derivatives estimated using finite differencing. It is
+    meant as a tool for unit testing, giving you more fine-grained
+    control than the check_gradients option in the solver options.
 
     The condition enforced is that
 
     .. math:: \forall{i,j}: \frac{J_{ij} - J'_{ij}}{max_{ij}(J_{ij} - J'_{ij})} < r
 
-    where :math:`J_{ij}` is the jacobian as computed by the supplied cost
-    function (by the user) multiplied by the local parameterization Jacobian,
+    where :math:`J_{ij}` is the jacobian as computed by the supplied
+    cost function multiplied by the `Manifold::PlusJacobian`,
     :math:`J'_{ij}` is the jacobian as computed by finite differences,
-    multiplied by the local parameterization Jacobian as well, and :math:`r`
+    multiplied by the `Manifold::PlusJacobian` as well, and :math:`r`
     is the relative precision.
 
    Usage:
 
    .. code-block:: c++
 
-       //  my_cost_function takes two parameter blocks. The first has a local
-       //  parameterization associated with it.
+       // my_cost_function takes two parameter blocks. The first has a
+       // manifold associated with it.
+
        CostFunction* my_cost_function = ...
-       LocalParameterization* my_parameterization = ...
+       Manifold* my_manifold = ...
        NumericDiffOptions numeric_diff_options;
 
-       std::vector<LocalParameterization*> local_parameterizations;
-       local_parameterizations.push_back(my_parameterization);
-       local_parameterizations.push_back(nullptr);
+       std::vector<Manifold*> manifolds;
+       manifolds.push_back(my_manifold);
+       manifolds.push_back(nullptr);
 
        std::vector parameter1;
        std::vector parameter2;
@@ -906,7 +913,8 @@
        parameter_blocks.push_back(parameter2.data());
 
        GradientChecker gradient_checker(my_cost_function,
-           local_parameterizations, numeric_diff_options);
+                                        manifolds,
+                                        numeric_diff_options);
        GradientCheckResults results;
        if (!gradient_checker.Probe(parameter_blocks.data(), 1e-9, &results) {
          LOG(ERROR) << "An error has occurred:\n" << results.error_log;
@@ -1065,6 +1073,10 @@
 
    .. math:: \rho(s,a,b) = b \log(1 + e^{(s - a) / b}) - b \log(1 + e^{-a / b})
 
+.. class:: TukeyLoss
+
+   .. math:: \rho(s) = \begin{cases} \frac{1}{3} (1 - (1 - s)^3) & s \le 1\\ \frac{1}{3} & s > 1 \end{cases}
+
 .. class:: ComposedLoss
 
    Given two loss functions ``f`` and ``g``, implements the loss
@@ -1115,9 +1127,8 @@
 
      // Add parameter blocks
 
-     CostFunction* cost_function =
-         new AutoDiffCostFunction < UW_Camera_Mapper, 2, 9, 3>(
-             new UW_Camera_Mapper(feature_x, feature_y));
+     auto* cost_function =
+         new AutoDiffCostFunction<UW_Camera_Mapper, 2, 9, 3>(feature_x, feature_y);
 
      LossFunctionWrapper* loss_function(new HuberLoss(1.0), TAKE_OWNERSHIP);
      problem.AddResidualBlock(cost_function, loss_function, parameters);
@@ -1154,7 +1165,6 @@
 matrix such that the robustified Gauss-Newton step corresponds to an
 ordinary linear least squares problem.
 
-
 Let :math:`\alpha` be a root of
 
 .. math:: \frac{1}{2}\alpha^2 - \alpha - \frac{\rho''}{\rho'}\|f(x)\|^2 = 0.
@@ -1178,363 +1188,644 @@
 problems.
 
 
-:class:`LocalParameterization`
-==============================
-
-.. class:: LocalParameterization
-
-  In many optimization problems, especially sensor fusion problems,
-  one has to model quantities that live in spaces known as `Manifolds
-  <https://en.wikipedia.org/wiki/Manifold>`_ , for example the
-  rotation/orientation of a sensor that is represented by a
-  `Quaternion
-  <https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation>`_.
-
-  Manifolds are spaces, which locally look like Euclidean spaces. More
-  precisely, at each point on the manifold there is a linear space
-  that is tangent to the manifold. It has dimension equal to the
-  intrinsic dimension of the manifold itself, which is less than or
-  equal to the ambient space in which the manifold is embedded.
-
-  For example, the tangent space to a point on a sphere in three
-  dimensions is the two dimensional plane that is tangent to the
-  sphere at that point. There are two reasons tangent spaces are
-  interesting:
-
-  1. They are Euclidean spaces, so the usual vector space operations
-     apply there, which makes numerical operations easy.
-
-  2. Movement in the tangent space translate into movements along the
-     manifold.  Movements perpendicular to the tangent space do not
-     translate into movements on the manifold.
-
-     Returning to our sphere example, moving in the 2 dimensional
-     plane tangent to the sphere and projecting back onto the sphere
-     will move you away from the point you started from but moving
-     along the normal at the same point and the projecting back onto
-     the sphere brings you back to the point.
-
-  Besides the mathematical niceness, modeling manifold valued
-  quantities correctly and paying attention to their geometry has
-  practical benefits too:
-
-  1. It naturally constrains the quantity to the manifold through out
-     the optimization. Freeing the user from hacks like *quaternion
-     normalization*.
-
-  2. It reduces the dimension of the optimization problem to its
-     *natural* size. For example, a quantity restricted to a line, is a
-     one dimensional object regardless of the dimension of the ambient
-     space in which this line lives.
-
-     Working in the tangent space reduces not just the computational
-     complexity of the optimization algorithm, but also improves the
-     numerical behaviour of the algorithm.
-
-  A basic operation one can perform on a manifold is the
-  :math:`\boxplus` operation that computes the result of moving along
-  delta in the tangent space at x, and then projecting back onto the
-  manifold that x belongs to. Also known as a *Retraction*,
-  :math:`\boxplus` is a generalization of vector addition in Euclidean
-  spaces. Formally, :math:`\boxplus` is a smooth map from a
-  manifold :math:`\mathcal{M}` and its tangent space
-  :math:`T_\mathcal{M}` to the manifold :math:`\mathcal{M}` that
-  obeys the identity
-
-  .. math::  \boxplus(x, 0) = x,\quad \forall x.
-
-  That is, it ensures that the tangent space is *centered* at :math:`x`
-  and the zero vector is the identity element. For more see
-  [Hertzberg]_ and section A.6.9 of [HartleyZisserman]_.
-
-  Let us consider two examples:
-
-  The Euclidean space :math:`R^n` is the simplest example of a
-  manifold. It has dimension :math:`n` (and so does its tangent space)
-  and :math:`\boxplus` is the familiar vector sum operation.
-
-    .. math:: \boxplus(x, \Delta) = x + \Delta
-
-  A more interesting case is :math:`SO(3)`, the special orthogonal
-  group in three dimensions - the space of 3x3 rotation
-  matrices. :math:`SO(3)` is a three dimensional manifold embedded in
-  :math:`R^9` or :math:`R^{3\times 3}`.
-
-  :math:`\boxplus` on :math:`SO(3)` is defined using the *Exponential*
-  map, from the tangent space (:math:`R^3`) to the manifold. The
-  Exponential map :math:`\operatorname{Exp}` is defined as:
-
-  .. math::
-
-     \operatorname{Exp}([p,q,r]) = \left [ \begin{matrix}
-     \cos \theta + cp^2 & -sr + cpq        &  sq + cpr \\
-     sr + cpq         & \cos \theta + cq^2& -sp + cqr \\
-     -sq + cpr        & sp + cqr         & \cos \theta + cr^2
-     \end{matrix} \right ]
-
-  where,
-
-  .. math::
-     \theta = \sqrt{p^2 + q^2 + r^2}, s = \frac{\sin \theta}{\theta},
-     c = \frac{1 - \cos \theta}{\theta^2}.
-
-  Then,
-
-  .. math::
-
-     \boxplus(x, \Delta) = x \operatorname{Exp}(\Delta)
-
-  The ``LocalParameterization`` interface allows the user to define
-  and associate with parameter blocks the manifold that they belong
-  to. It does so by defining the ``Plus`` (:math:`\boxplus`) operation
-  and its derivative with respect to :math:`\Delta` at :math:`\Delta =
-  0`.
-
-   .. code-block:: c++
-
-     class LocalParameterization {
-      public:
-       virtual ~LocalParameterization() {}
-       virtual bool Plus(const double* x,
-                         const double* delta,
-                         double* x_plus_delta) const = 0;
-       virtual bool ComputeJacobian(const double* x, double* jacobian) const = 0;
-       virtual bool MultiplyByJacobian(const double* x,
-                                       const int num_rows,
-                                       const double* global_matrix,
-                                       double* local_matrix) const;
-       virtual int GlobalSize() const = 0;
-       virtual int LocalSize() const = 0;
-     };
+While the theory described above is elegant, in practice we observe
+that using the Triggs correction when :math:`\rho'' > 0` leads to poor
+performance, so we upper bound it by zero. For more details see
+`corrector.cc <https://github.com/ceres-solver/ceres-solver/blob/master/internal/ceres/corrector.cc#L51>`_
 
 
-.. function:: int LocalParameterization::GlobalSize()
+:class:`Manifold`
+==================
 
-   The dimension of the ambient space in which the parameter block
-   :math:`x` lives.
+.. class:: Manifold
 
-.. function:: int LocalParameterization::LocalSize()
+In sensor fusion problems, we often have to model quantities that live
+in spaces known as `Manifolds
+<https://en.wikipedia.org/wiki/Manifold>`_, for example the
+rotation/orientation of a sensor that is represented by a `Quaternion
+<https://en.wikipedia.org/wiki/Quaternion>`_.
 
-   The size of the tangent space that :math:`\Delta` lives in.
+Manifolds are spaces which locally look like Euclidean spaces. More
+precisely, at each point on the manifold there is a linear space that
+is tangent to the manifold. It has dimension equal to the intrinsic
+dimension of the manifold itself, which is less than or equal to the
+ambient space in which the manifold is embedded.
 
-.. function:: bool LocalParameterization::Plus(const double* x, const double* delta, double* x_plus_delta) const
+For example, the tangent space to a point on a sphere in three
+dimensions is the two dimensional plane that is tangent to the sphere
+at that point. There are two reasons tangent spaces are interesting:
 
-    :func:`LocalParameterization::Plus` implements :math:`\boxplus(x,\Delta)`.
+1. They are Eucliean spaces so the usual vector space operations apply
+   there, which makes numerical operations easy.
 
-.. function:: bool LocalParameterization::ComputeJacobian(const double* x, double* jacobian) const
+2. Movements in the tangent space translate into movements along the
+   manifold.  Movements perpendicular to the tangent space do not
+   translate into movements on the manifold.
 
-   Computes the Jacobian matrix
+However, moving along the 2 dimensional plane tangent to the sphere
+and projecting back onto the sphere will move you away from the point
+you started from but moving along the normal at the same point and the
+projecting back onto the sphere brings you back to the point.
 
-   .. math:: J = D_2 \boxplus(x, 0)
+Besides the mathematical niceness, modeling manifold valued
+quantities correctly and paying attention to their geometry has
+practical benefits too:
 
-   in row major form.
+1. It naturally constrains the quantity to the manifold throughout the
+   optimization, freeing the user from hacks like *quaternion
+   normalization*.
 
-.. function:: bool MultiplyByJacobian(const double* x, const int num_rows, const double* global_matrix, double* local_matrix) const
+2. It reduces the dimension of the optimization problem to its
+   *natural* size. For example, a quantity restricted to a line is a
+   one dimensional object regardless of the dimension of the ambient
+   space in which this line lives.
 
-   ``local_matrix = global_matrix * jacobian``
+   Working in the tangent space reduces not just the computational
+   complexity of the optimization algorithm, but also improves the
+   numerical behaviour of the algorithm.
 
-   ``global_matrix`` is a ``num_rows x GlobalSize``  row major matrix.
-   ``local_matrix`` is a ``num_rows x LocalSize`` row major matrix.
-   ``jacobian`` is the matrix returned by :func:`LocalParameterization::ComputeJacobian` at :math:`x`.
+A basic operation one can perform on a manifold is the
+:math:`\boxplus` operation that computes the result of moving along
+:math:`\delta` in the tangent space at :math:`x`, and then projecting
+back onto the manifold that :math:`x` belongs to. Also known as a
+*Retraction*, :math:`\boxplus` is a generalization of vector addition
+in Euclidean spaces.
 
-   This is only used by :class:`GradientProblem`. For most normal
-   uses, it is okay to use the default implementation.
+The inverse of :math:`\boxplus` is :math:`\boxminus`, which given two
+points :math:`y` and :math:`x` on the manifold computes the tangent
+vector :math:`\Delta` at :math:`x` s.t. :math:`\boxplus(x, \Delta) =
+y`.
+
+Let us now consider two examples.
+
+The `Euclidean space <https://en.wikipedia.org/wiki/Euclidean_space>`_
+:math:`\mathbb{R}^n` is the simplest example of a manifold. It has
+dimension :math:`n` (and so does its tangent space) and
+:math:`\boxplus` and :math:`\boxminus` are the familiar vector sum and
+difference operations.
+
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= x + \Delta = y\\
+   \boxminus(y, x) &= y - x = \Delta.
+   \end{align*}
+
+A more interesting case is the case :math:`SO(3)`, the `special
+orthogonal group <https://en.wikipedia.org/wiki/3D_rotation_group>`_
+in three dimensions - the space of :math:`3\times3` rotation
+matrices. :math:`SO(3)` is a three dimensional manifold embedded in
+:math:`\mathbb{R}^9` or :math:`\mathbb{R}^{3\times 3}`.  So points on :math:`SO(3)` are
+represented using 9 dimensional vectors or :math:`3\times 3` matrices,
+and points in its tangent spaces are represented by 3 dimensional
+vectors.
+
+For :math:`SO(3)`, :math:`\boxplus` and :math:`\boxminus` are defined
+in terms of the matrix :math:`\exp` and :math:`\log` operations as
+follows:
+
+Given a 3-vector :math:`\Delta = [\begin{matrix}p,&q,&r\end{matrix}]`, we have
+
+.. math::
+
+   \exp(\Delta) & = \left [ \begin{matrix}
+   \cos \theta + cp^2 & -sr + cpq        &  sq + cpr \\
+   sr + cpq         & \cos \theta + cq^2& -sp + cqr \\
+   -sq + cpr        & sp + cqr         & \cos \theta + cr^2
+   \end{matrix} \right ]
+
+where,
+
+.. math::
+     \begin{align}
+     \theta &= \sqrt{p^2 + q^2 + r^2},\\
+     s &= \frac{\sin \theta}{\theta},\\
+     c &= \frac{1 - \cos \theta}{\theta^2}.
+     \end{align}
+
+Given :math:`x \in SO(3)`, we have
+
+.. math::
+
+   \log(x) = 1/(2 \sin(\theta)/\theta)\left[\begin{matrix} x_{32} - x_{23},& x_{13} - x_{31},& x_{21} - x_{12}\end{matrix} \right]
+
+
+where,
+
+.. math:: \theta = \cos^{-1}((\operatorname{Trace}(x) - 1)/2)
+
+Then,
+
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= x \exp(\Delta)
+   \\
+   \boxminus(y, x) &= \log(x^T y)
+   \end{align*}
+
+For :math:`\boxplus` and :math:`\boxminus` to be mathematically
+consistent, the following identities must be satisfied at all points
+:math:`x` on the manifold:
+
+1. :math:`\boxplus(x, 0) = x`. This ensures that the tangent space is
+   *centered* at :math:`x`, and the zero vector is the identity
+   element.
+2. For all :math:`y` on the manifold, :math:`\boxplus(x,
+   \boxminus(y,x)) = y`. This ensures that any :math:`y` can be
+   reached from :math:`x`.
+3. For all :math:`\Delta`, :math:`\boxminus(\boxplus(x, \Delta), x) =
+   \Delta`. This ensures that :math:`\boxplus` is an injective
+   (one-to-one) map.
+4. For all :math:`\Delta_1, \Delta_2\ |\boxminus(\boxplus(x, \Delta_1),
+   \boxplus(x, \Delta_2)) \leq |\Delta_1 - \Delta_2|`. Allows us to define
+   a metric on the manifold.
+
+Additionally we require that :math:`\boxplus` and :math:`\boxminus` be
+sufficiently smooth. In particular they need to be differentiable
+everywhere on the manifold.
+
+For more details, please see [Hertzberg]_
+
+The :class:`Manifold` interface allows the user to define a manifold
+for the purposes optimization by implementing ``Plus`` and ``Minus``
+operations and their derivatives (corresponding naturally to
+:math:`\boxplus` and :math:`\boxminus`).
+
+.. code-block:: c++
+
+  class Manifold {
+   public:
+    virtual ~Manifold();
+    virtual int AmbientSize() const = 0;
+    virtual int TangentSize() const = 0;
+    virtual bool Plus(const double* x,
+                      const double* delta,
+                      double* x_plus_delta) const = 0;
+    virtual bool PlusJacobian(const double* x, double* jacobian) const = 0;
+    virtual bool RightMultiplyByPlusJacobian(const double* x,
+                                             const int num_rows,
+                                             const double* ambient_matrix,
+                                             double* tangent_matrix) const;
+    virtual bool Minus(const double* y,
+                       const double* x,
+                       double* y_minus_x) const = 0;
+    virtual bool MinusJacobian(const double* x, double* jacobian) const = 0;
+  };
+
+
+.. function:: int Manifold::AmbientSize() const;
+
+   Dimension of the ambient space in which the manifold is embedded.
+
+.. function:: int Manifold::TangentSize() const;
+
+   Dimension of the manifold/tangent space.
+
+.. function:: bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
+
+   Implements the :math:`\boxplus(x,\Delta)` operation for the manifold.
+
+   A generalization of vector addition in Euclidean space, ``Plus``
+   computes the result of moving along ``delta`` in the tangent space
+   at ``x``, and then projecting back onto the manifold that ``x``
+   belongs to.
+
+   ``x`` and ``x_plus_delta`` are :func:`Manifold::AmbientSize` vectors.
+   ``delta`` is a :func:`Manifold::TangentSize` vector.
+
+   Return value indicates if the operation was successful or not.
+
+.. function:: bool PlusJacobian(const double* x, double* jacobian) const;
+
+   Compute the derivative of :math:`\boxplus(x, \Delta)` w.r.t
+   :math:`\Delta` at :math:`\Delta = 0`, i.e. :math:`(D_2
+   \boxplus)(x, 0)`.
+
+   ``jacobian`` is a row-major :func:`Manifold::AmbientSize`
+   :math:`\times` :func:`Manifold::TangentSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
+
+.. function:: bool RightMultiplyByPlusJacobian(const double* x, const int num_rows, const double* ambient_matrix, double* tangent_matrix) const;
+
+   ``tangent_matrix`` = ``ambient_matrix`` :math:`\times` plus_jacobian.
+
+
+   ``ambient_matrix`` is a row-major ``num_rows`` :math:`\times`
+   :func:`Manifold::AmbientSize` matrix.
+
+   ``tangent_matrix`` is a row-major ``num_rows`` :math:`\times`
+   :func:`Manifold::TangentSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
+
+   This function is only used by the :class:`GradientProblemSolver`,
+   where the dimension of the parameter block can be large and it may
+   be more efficient to compute this product directly rather than
+   first evaluating the Jacobian into a matrix and then doing a matrix
+   vector product.
+
+   Because this is not an often used function, we provide a default
+   implementation for convenience. If performance becomes an issue
+   then the user should consider implementing a specialization.
+
+.. function:: bool Minus(const double* y, const double* x, double* y_minus_x) const;
+
+   Implements :math:`\boxminus(y,x)` operation for the manifold.
+
+   A generalization of vector subtraction in Euclidean spaces, given
+   two points ``x`` and ``y`` on the manifold, ``Minus`` computes the
+   change to ``x`` in the tangent space at ``x``, that will take it to
+   ``y``.
+
+   ``x`` and ``y`` are :func:`Manifold::AmbientSize` vectors.
+   ``y_minus_x`` is a ::func:`Manifold::TangentSize` vector.
+
+   Return value indicates if the operation was successful or not.
+
+.. function:: bool MinusJacobian(const double* x, double* jacobian) const = 0;
+
+   Compute the derivative of :math:`\boxminus(y, x)` w.r.t :math:`y`
+   at :math:`y = x`, i.e :math:`(D_1 \boxminus) (x, x)`.
+
+   ``jacobian`` is a row-major :func:`Manifold::TangentSize`
+   :math:`\times` :func:`Manifold::AmbientSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
 
 Ceres Solver ships with a number of commonly used instances of
-:class:`LocalParameterization`. Another great place to find high
-quality implementations of :math:`\boxplus` operations on a variety of
-manifolds is the `Sophus <https://github.com/strasdat/Sophus>`_
-library developed by Hauke Strasdat and his collaborators.
+:class:`Manifold`.
 
-:class:`IdentityParameterization`
----------------------------------
+For `Lie Groups <https://en.wikipedia.org/wiki/Lie_group>`_, a great
+place to find high quality implementations is the `Sophus
+<https://github.com/strasdat/Sophus>`_ library developed by Hauke
+Strasdat and his collaborators.
 
-A trivial version of :math:`\boxplus` is when :math:`\Delta` is of the
-same size as :math:`x` and
+:class:`EuclideanManifold`
+--------------------------
 
-.. math::  \boxplus(x, \Delta) = x + \Delta
+.. class:: EuclideanManifold
 
-This is the same as :math:`x` living in a Euclidean manifold.
+:class:`EuclideanManifold` as the name implies represents a Euclidean
+space, where the :math:`\boxplus` and :math:`\boxminus` operations are
+the usual vector addition and subtraction.
 
-:class:`QuaternionParameterization`
------------------------------------
+.. math::
 
-Another example that occurs commonly in Structure from Motion problems
-is when camera rotations are parameterized using a quaternion. This is
-a 3-dimensional manifold that lives in 4-dimensional space.
+   \begin{align*}
+     \boxplus(x, \Delta) &= x + \Delta\\
+      \boxminus(y,x) &= y - x
+   \end{align*}
 
-.. math:: \boxplus(x, \Delta) = \left[ \cos(|\Delta|), \frac{\sin\left(|\Delta|\right)}{|\Delta|} \Delta \right] * x
+By default parameter blocks are assumed to be Euclidean, so there is
+no need to use this manifold on its own. It is provided for the
+purpose of testing and for use in combination with other manifolds
+using :class:`ProductManifold`.
 
-The multiplication :math:`*` between the two 4-vectors on the right
-hand side is the standard quaternion product.
+The class works with dynamic and static ambient space dimensions. If
+the ambient space dimensions is known at compile time use
 
-:class:`EigenQuaternionParameterization`
-----------------------------------------
+.. code-block:: c++
 
-`Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_ uses a
-different internal memory layout for the elements of the quaternion
-than what is commonly used. Specifically, Eigen stores the elements in
-memory as :math:`(x, y, z, w)`, i.e., the *real* part (:math:`w`) is
-stored as the last element. Note, when creating an Eigen quaternion
-through the constructor the elements are accepted in :math:`w, x, y,
-z` order.
+   EuclideanManifold<3> manifold;
 
-Since Ceres operates on parameter blocks which are raw ``double``
-pointers this difference is important and requires a different
-parameterization. :class:`EigenQuaternionParameterization` uses the
-same ``Plus`` operation as :class:`QuaternionParameterization` but
-takes into account Eigen's internal memory element ordering.
+If the ambient space dimensions is not known at compile time the
+template parameter needs to be set to `ceres::DYNAMIC` and the actual
+dimension needs to be provided as a constructor argument:
 
-:class:`SubsetParameterization`
--------------------------------
+.. code-block:: c++
+
+   EuclideanManifold<ceres::DYNAMIC> manifold(ambient_dim);
+
+:class:`SubsetManifold`
+-----------------------
+
+.. class:: SubsetManifold
 
 Suppose :math:`x` is a two dimensional vector, and the user wishes to
 hold the first coordinate constant. Then, :math:`\Delta` is a scalar
 and :math:`\boxplus` is defined as
 
-.. math:: \boxplus(x, \Delta) = x + \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \Delta
+.. math::
+   \boxplus(x, \Delta) = x + \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \Delta
 
-:class:`SubsetParameterization` generalizes this construction to hold
+and given two, two-dimensional vectors :math:`x` and :math:`y` with
+the same first coordinate, :math:`\boxminus` is defined as:
+
+.. math::
+   \boxminus(y, x) = y[1] - x[1]
+
+:class:`SubsetManifold` generalizes this construction to hold
 any part of a parameter block constant by specifying the set of
 coordinates that are held constant.
 
 .. NOTE::
-   It is legal to hold all coordinates of a parameter block to constant
-   using a :class:`SubsetParameterization`. It is the same as calling
+
+   It is legal to hold *all* coordinates of a parameter block to
+   constant using a :class:`SubsetManifold`. It is the same as calling
    :func:`Problem::SetParameterBlockConstant` on that parameter block.
 
-:class:`HomogeneousVectorParameterization`
-------------------------------------------
 
-In computer vision, homogeneous vectors are commonly used to represent
-objects in projective geometry such as points in projective space. One
-example where it is useful to use this over-parameterization is in
-representing points whose triangulation is ill-conditioned. Here it is
-advantageous to use homogeneous vectors, instead of an Euclidean
-vector, because it can represent points at and near infinity.
+:class:`ProductManifold`
+------------------------
 
-:class:`HomogeneousVectorParameterization` defines a
-:class:`LocalParameterization` for an :math:`n-1` dimensional
-manifold that embedded in :math:`n` dimensional space where the
-scale of the vector does not matter, i.e., elements of the
-projective space :math:`\mathbb{P}^{n-1}`. It assumes that the last
-coordinate of the :math:`n`-vector is the *scalar* component of the
-homogenous vector, i.e., *finite* points in this representation are
-those for which the *scalar* component is non-zero.
-
-Further, ``HomogeneousVectorParameterization::Plus`` preserves the
-scale of :math:`x`.
-
-:class:`LineParameterization`
------------------------------
-
-This class provides a parameterization for lines, where the line is
-defined using an origin point and a direction vector. So the
-parameter vector size needs to be two times the ambient space
-dimension, where the first half is interpreted as the origin point
-and the second half as the direction. This local parameterization is
-a special case of the `Affine Grassmannian manifold
-<https://en.wikipedia.org/wiki/Affine_Grassmannian_(manifold))>`_
-for the case :math:`\operatorname{Graff}_1(R^n)`.
-
-Note that this is a parameterization for a line, rather than a point
-constrained to lie on a line. It is useful when one wants to optimize
-over the space of lines. For example, :math:`n` distinct points in 3D
-(measurements) we want to find the line that minimizes the sum of
-squared distances to all the points.
-
-:class:`ProductParameterization`
---------------------------------
-
-Consider an optimization problem over the space of rigid
-transformations :math:`SE(3)`, which is the Cartesian product of
-:math:`SO(3)` and :math:`\mathbb{R}^3`. Suppose you are using
-Quaternions to represent the rotation, Ceres ships with a local
-parameterization for that and :math:`\mathbb{R}^3` requires no, or
-:class:`IdentityParameterization` parameterization. So how do we
-construct a local parameterization for a parameter block a rigid
-transformation?
+.. class:: ProductManifold
 
 In cases, where a parameter block is the Cartesian product of a number
-of manifolds and you have the local parameterization of the individual
-manifolds available, :class:`ProductParameterization` can be used to
-construct a local parameterization of the cartesian product. For the
-case of the rigid transformation, where say you have a parameter block
-of size 7, where the first four entries represent the rotation as a
-quaternion, a local parameterization can be constructed as
+of manifolds and you have the manifold of the individual
+parameter blocks available, :class:`ProductManifold` can be used to
+construct a :class:`Manifold` of the Cartesian product.
+
+For the case of the rigid transformation, where say you have a
+parameter block of size 7, where the first four entries represent the
+rotation as a quaternion, and the next three the translation, a
+manifold can be constructed as:
 
 .. code-block:: c++
 
-   ProductParameterization se3_param(new QuaternionParameterization(),
-                                     new IdentityParameterization(3));
+   ProductManifold<QuaternionManifold, EuclideanManifold<3>> se3;
+
+Manifolds can be copied and moved to :class:`ProductManifold`:
+
+.. code-block:: c++
+
+   SubsetManifold manifold1(5, {2});
+   SubsetManifold manifold2(3, {0, 1});
+   ProductManifold<SubsetManifold, SubsetManifold> manifold(manifold1,
+                                                            manifold2);
+
+In advanced use cases, manifolds can be dynamically allocated and passed as (smart) pointers:
+
+.. code-block:: c++
+
+   ProductManifold<std::unique_ptr<QuaternionManifold>, EuclideanManifold<3>> se3
+        {std::make_unique<QuaternionManifold>(), EuclideanManifold<3>{}};
+
+The template parameters can also be left out as they are deduced automatically
+making the initialization much simpler:
+
+.. code-block:: c++
+
+   ProductManifold se3{QuaternionManifold{}, EuclideanManifold<3>{}};
 
 
-:class:`AutoDiffLocalParameterization`
-======================================
+:class:`QuaternionManifold`
+---------------------------
 
-.. class:: AutoDiffLocalParameterization
+.. class:: QuaternionManifold
 
-  :class:`AutoDiffLocalParameterization` does for
-  :class:`LocalParameterization` what :class:`AutoDiffCostFunction`
-  does for :class:`CostFunction`. It allows the user to define a
-  templated functor that implements the
-  :func:`LocalParameterization::Plus` operation and it uses automatic
-  differentiation to implement the computation of the Jacobian.
+.. NOTE::
 
-  To get an auto differentiated local parameterization, you must
-  define a class with a templated operator() (a functor) that computes
+   If you are using ``Eigen`` quaternions, then you should use
+   :class:`EigenQuaternionManifold` instead because ``Eigen`` uses a
+   different memory layout for its Quaternions.
 
-     .. math:: x' = \boxplus(x, \Delta x),
+Manifold for a Hamilton `Quaternion
+<https://en.wikipedia.org/wiki/Quaternion>`_. Quaternions are a three
+dimensional manifold represented as unit norm 4-vectors, i.e.
 
-  For example, Quaternions have a three dimensional local
-  parameterization. Its plus operation can be implemented as (taken
-  from `internal/ceres/autodiff_local_parameterization_test.cc
-  <https://ceres-solver.googlesource.com/ceres-solver/+/master/internal/ceres/autodiff_local_parameterization_test.cc>`_
-  )
+.. math:: q = \left [\begin{matrix}q_0,& q_1,& q_2,& q_3\end{matrix}\right], \quad \|q\| = 1
 
-    .. code-block:: c++
+is the ambient space representation. Here :math:`q_0` is the scalar
+part. :math:`q_1` is the coefficient of :math:`i`, :math:`q_2` is the
+coefficient of :math:`j`, and :math:`q_3` is the coefficient of
+:math:`k`. Where:
 
-      struct QuaternionPlus {
-        template<typename T>
-        bool operator()(const T* x, const T* delta, T* x_plus_delta) const {
-          const T squared_norm_delta =
-              delta[0] * delta[0] + delta[1] * delta[1] + delta[2] * delta[2];
+.. math::
 
-          T q_delta[4];
-          if (squared_norm_delta > 0.0) {
-            T norm_delta = sqrt(squared_norm_delta);
-            const T sin_delta_by_delta = sin(norm_delta) / norm_delta;
-            q_delta[0] = cos(norm_delta);
-            q_delta[1] = sin_delta_by_delta * delta[0];
-            q_delta[2] = sin_delta_by_delta * delta[1];
-            q_delta[3] = sin_delta_by_delta * delta[2];
-          } else {
-            // We do not just use q_delta = [1,0,0,0] here because that is a
-            // constant and when used for automatic differentiation will
-            // lead to a zero derivative. Instead we take a first order
-            // approximation and evaluate it at zero.
-            q_delta[0] = T(1.0);
-            q_delta[1] = delta[0];
-            q_delta[2] = delta[1];
-            q_delta[3] = delta[2];
-          }
+   \begin{align*}
+   i\times j &= k,\\
+   j\times k &= i,\\
+   k\times i &= j,\\
+   i\times i &= -1,\\
+   j\times j &= -1,\\
+   k\times k &= -1.
+   \end{align*}
 
-          Quaternionproduct(q_delta, x, x_plus_delta);
-          return true;
-        }
-      };
+The tangent space is three dimensional and the :math:`\boxplus` and
+:math:`\boxminus` operators are defined in term of :math:`\exp` and
+:math:`\log` operations.
 
-  Given this struct, the auto differentiated local
-  parameterization can now be constructed as
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= \exp\left(\Delta\right) \otimes  x \\
+   \boxminus(y,x) &= \log\left(y \otimes x^{-1}\right)
+   \end{align*}
 
-  .. code-block:: c++
+Where :math:`\otimes` is the `Quaternion product
+<https://en.wikipedia.org/wiki/Quaternion#Hamilton_product>`_ and
+since :math:`x` is a unit quaternion, :math:`x^{-1} = [\begin{matrix}
+q_0,& -q_1,& -q_2,& -q_3\end{matrix}]`. Given a vector :math:`\Delta
+\in \mathbb{R}^3`,
 
-     LocalParameterization* local_parameterization =
-         new AutoDiffLocalParameterization<QuaternionPlus, 4, 3>;
-                                                           |  |
-                                Global Size ---------------+  |
-                                Local Size -------------------+
+.. math::
+   \exp(\Delta) = \left[ \begin{matrix}
+                         \cos\left(\|\Delta\|\right)\\
+			 \frac{\displaystyle \sin\left(|\Delta\|\right)}{\displaystyle \|\Delta\|} \Delta
+    	                 \end{matrix} \right]
 
+and given a unit quaternion :math:`q = \left [\begin{matrix}q_0,& q_1,& q_2,& q_3\end{matrix}\right]`
+
+.. math::
+
+   \log(q) =  \frac{\operatorname{atan2}\left(\sqrt{1-q_0^2},q_0\right)}{\sqrt{1-q_0^2}} \left [\begin{matrix}q_1,& q_2,& q_3\end{matrix}\right]
+
+
+:class:`EigenQuaternionManifold`
+--------------------------------
+
+.. class:: EigenQuaternionManifold
+
+Implements the quaternion manifold for `Eigen's
+<http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
+representation of the Hamilton quaternion. Geometrically it is exactly
+the same as the :class:`QuaternionManifold` defined above. However,
+Eigen uses a different internal memory layout for the elements of the
+quaternion than what is commonly used. It stores the quaternion in
+memory as :math:`[q_1, q_2, q_3, q_0]` or :math:`[x, y, z, w]` where
+the real (scalar) part is last.
+
+Since Ceres operates on parameter blocks which are raw double pointers
+this difference is important and requires a different manifold.
+
+:class:`SphereManifold`
+-----------------------
+
+.. class:: SphereManifold
+
+This provides a manifold on a sphere meaning that the norm of the
+vector stays the same. Such cases often arises in Structure for Motion
+problems. One example where they are used is in representing points
+whose triangulation is ill-conditioned. Here it is advantageous to use
+an over-parameterization since homogeneous vectors can represent
+points at infinity.
+
+The ambient space dimension is required to be greater than 1.
+
+The class works with dynamic and static ambient space dimensions. If
+the ambient space dimensions is known at compile time use
+
+.. code-block:: c++
+
+   SphereManifold<3> manifold;
+
+If the ambient space dimensions is not known at compile time the
+template parameter needs to be set to `ceres::DYNAMIC` and the actual
+dimension needs to be provided as a constructor argument:
+
+.. code-block:: c++
+
+   SphereManifold<ceres::DYNAMIC> manifold(ambient_dim);
+
+For more details, please see Section B.2 (p.25) in [Hertzberg]_
+
+
+:class:`LineManifold`
+---------------------
+
+.. class:: LineManifold
+
+This class provides a manifold for lines, where the line is defined
+using an origin point and a direction vector. So the ambient size
+needs to be two times the dimension of the space in which the line
+lives.  The first half of the parameter block is interpreted as the
+origin point and the second half as the direction. This manifold is a
+special case of the `Affine Grassmannian manifold
+<https://en.wikipedia.org/wiki/Affine_Grassmannian_(manifold))>`_ for
+the case :math:`\operatorname{Graff}_1(R^n)`.
+
+Note that this is a manifold for a line, rather than a point
+constrained to lie on a line. It is useful when one wants to optimize
+over the space of lines. For example, given :math:`n` distinct points
+in 3D (measurements) we want to find the line that minimizes the sum
+of squared distances to all the points.
+
+:class:`AutoDiffManifold`
+=========================
+
+.. class:: AutoDiffManifold
+
+Create a :class:`Manifold` with Jacobians computed via automatic
+differentiation.
+
+To get an auto differentiated manifold, you must define a Functor with
+templated ``Plus`` and ``Minus`` functions that compute:
+
+.. code-block:: c++
+
+  x_plus_delta = Plus(x, delta);
+  y_minus_x    = Minus(y, x);
+
+Where, ``x``, ``y`` and ``x_plus_delta`` are vectors on the manifold in
+the ambient space (so they are ``kAmbientSize`` vectors) and
+``delta``, ``y_minus_x`` are vectors in the tangent space (so they are
+``kTangentSize`` vectors).
+
+The Functor should have the signature:
+
+.. code-block:: c++
+
+   struct Functor {
+    template <typename T>
+    bool Plus(const T* x, const T* delta, T* x_plus_delta) const;
+
+    template <typename T>
+    bool Minus(const T* y, const T* x, T* y_minus_x) const;
+   };
+
+
+Observe that  the ``Plus`` and  ``Minus`` operations are  templated on
+the parameter  ``T``.  The autodiff framework  substitutes appropriate
+``Jet``  objects for  ``T`` in  order to  compute the  derivative when
+necessary.  This  is  the  same  mechanism that  is  used  to  compute
+derivatives when using :class:`AutoDiffCostFunction`.
+
+``Plus`` and ``Minus`` should return true if the computation is
+successful and false otherwise, in which case the result will not be
+used.
+
+Given this Functor, the corresponding :class:`Manifold` can be constructed as:
+
+.. code-block:: c++
+
+   AutoDiffManifold<Functor, kAmbientSize, kTangentSize> manifold;
+
+.. NOTE::
+
+   The following is only used for illustration purposes. Ceres Solver
+   ships with an optimized, production grade :class:`QuaternionManifold`
+   implementation.
+
+As a concrete example consider the case of `Quaternions
+<https://en.wikipedia.org/wiki/Quaternion>`_. Quaternions form a three
+dimensional manifold embedded in :math:`\mathbb{R}^4`, i.e. they have
+an ambient dimension of 4 and their tangent space has dimension 3. The
+following Functor defines the ``Plus`` and ``Minus`` operations on the
+Quaternion manifold. It assumes that the quaternions are laid out as
+``[w,x,y,z]`` in memory, i.e. the real or scalar part is the first
+coordinate.
+
+.. code-block:: c++
+
+   struct QuaternionFunctor {
+     template <typename T>
+     bool Plus(const T* x, const T* delta, T* x_plus_delta) const {
+       const T squared_norm_delta =
+           delta[0] * delta[0] + delta[1] * delta[1] + delta[2] * delta[2];
+
+       T q_delta[4];
+       if (squared_norm_delta > T(0.0)) {
+         T norm_delta = sqrt(squared_norm_delta);
+         const T sin_delta_by_delta = sin(norm_delta) / norm_delta;
+         q_delta[0] = cos(norm_delta);
+         q_delta[1] = sin_delta_by_delta * delta[0];
+         q_delta[2] = sin_delta_by_delta * delta[1];
+         q_delta[3] = sin_delta_by_delta * delta[2];
+       } else {
+         // We do not just use q_delta = [1,0,0,0] here because that is a
+         // constant and when used for automatic differentiation will
+         // lead to a zero derivative. Instead we take a first order
+         // approximation and evaluate it at zero.
+         q_delta[0] = T(1.0);
+         q_delta[1] = delta[0];
+         q_delta[2] = delta[1];
+         q_delta[3] = delta[2];
+       }
+
+       QuaternionProduct(q_delta, x, x_plus_delta);
+       return true;
+     }
+
+     template <typename T>
+     bool Minus(const T* y, const T* x, T* y_minus_x) const {
+       T minus_x[4] = {x[0], -x[1], -x[2], -x[3]};
+       T ambient_y_minus_x[4];
+       QuaternionProduct(y, minus_x, ambient_y_minus_x);
+       T u_norm = sqrt(ambient_y_minus_x[1] * ambient_y_minus_x[1] +
+		       ambient_y_minus_x[2] * ambient_y_minus_x[2] +
+		       ambient_y_minus_x[3] * ambient_y_minus_x[3]);
+       if (u_norm > 0.0) {
+	 T theta = atan2(u_norm, ambient_y_minus_x[0]);
+	 y_minus_x[0] = theta * ambient_y_minus_x[1] / u_norm;
+	 y_minus_x[1] = theta * ambient_y_minus_x[2] / u_norm;
+	 y_minus_x[2] = theta * ambient_y_minus_x[3] / u_norm;
+       } else {
+	 We do not use [0,0,0] here because even though the value part is
+	 a constant, the derivative part is not.
+	 y_minus_x[0] = ambient_y_minus_x[1];
+	 y_minus_x[1] = ambient_y_minus_x[2];
+	 y_minus_x[2] = ambient_y_minus_x[3];
+       }
+       return true;
+     }
+   };
+
+
+Then given this struct, the auto differentiated Quaternion Manifold can now
+be constructed as
+
+.. code-block:: c++
+
+   Manifold* manifold = new AutoDiffManifold<QuaternionFunctor, 4, 3>;
 
 :class:`Problem`
 ================
@@ -1581,10 +1872,10 @@
 
    :func:`Problem::AddParameterBlock` explicitly adds a parameter
    block to the :class:`Problem`. Optionally it allows the user to
-   associate a :class:`LocalParameterization` object with the
-   parameter block too. Repeated calls with the same arguments are
-   ignored. Repeated calls with the same double pointer but a
-   different size results in undefined behavior.
+   associate a :class:`Manifold` object with the parameter block
+   too. Repeated calls with the same arguments are ignored. Repeated
+   calls with the same double pointer but a different size results in
+   undefined behavior.
 
    You can set any parameter block to be constant using
    :func:`Problem::SetParameterBlockConstant` and undo this using
@@ -1604,17 +1895,16 @@
    **Ownership**
 
    :class:`Problem` by default takes ownership of the
-   ``cost_function``, ``loss_function`` and ``local_parameterization``
-   pointers. These objects remain live for the life of the
-   :class:`Problem`. If the user wishes to keep control over the
-   destruction of these objects, then they can do this by setting the
-   corresponding enums in the :class:`Problem::Options` struct.
+   ``cost_function``, ``loss_function`` and ``manifold`` pointers. These
+   objects remain live for the life of the :class:`Problem`. If the user wishes
+   to keep control over the destruction of these objects, then they can do this
+   by setting the corresponding enums in the :class:`Problem::Options` struct.
 
-   Note that even though the Problem takes ownership of ``cost_function``
-   and ``loss_function``, it does not preclude the user from re-using
-   them in another residual block. The destructor takes care to call
-   delete on each ``cost_function`` or ``loss_function`` pointer only
-   once, regardless of how many residual blocks refer to them.
+   Note that even though the Problem takes ownership of objects,
+   ``cost_function`` and ``loss_function``, it does not preclude the
+   user from re-using them in another residual block. Similarly the
+   same ``manifold`` object can be used with multiple parameter blocks. The
+   destructor takes care to call delete on each owned object exactly once.
 
 .. class:: Problem::Options
 
@@ -1627,7 +1917,7 @@
    This option controls whether the Problem object owns the cost
    functions.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
    cost functions on destruction. The destructor is careful to delete
    the pointers only once, since sharing cost functions is allowed.
 
@@ -1638,21 +1928,19 @@
    This option controls whether the Problem object owns the loss
    functions.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
    loss functions on destruction. The destructor is careful to delete
    the pointers only once, since sharing loss functions is allowed.
 
-.. member:: Ownership Problem::Options::local_parameterization_ownership
+.. member:: Ownership Problem::Options::manifold_ownership
 
    Default: ``TAKE_OWNERSHIP``
 
-   This option controls whether the Problem object owns the local
-   parameterizations.
+   This option controls whether the Problem object owns the manifolds.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
-   local parameterizations on destruction. The destructor is careful
-   to delete the pointers only once, since sharing local
-   parameterizations is allowed.
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
+   manifolds on destruction. The destructor is careful to delete the
+   pointers only once, since sharing manifolds is allowed.
 
 .. member:: bool Problem::Options::enable_fast_removal
 
@@ -1689,12 +1977,13 @@
     overhead you want to avoid, then you can set
     disable_all_safety_checks to true.
 
-    **WARNING** Do not set this to true, unless you are absolutely
-    sure of what you are doing.
+    .. warning::
+        Do not set this to true, unless you are absolutely sure of what you are
+        doing.
 
 .. member:: Context* Problem::Options::context
 
-    Default: `nullptr`
+    Default: ``nullptr``
 
     A Ceres global context to use for solving this problem. This may
     help to reduce computation time as Ceres can reuse expensive
@@ -1705,7 +1994,7 @@
 
 .. member:: EvaluationCallback* Problem::Options::evaluation_callback
 
-    Default: `nullptr`
+    Default: ``nullptr``
 
     Using this callback interface, Ceres will notify you when it is
     about to evaluate the residuals or Jacobians.
@@ -1725,11 +2014,11 @@
 
        Evaluation callbacks are incompatible with inner iterations. So
        calling Solve with
-       :member:`Solver::Options::use_inner_iterations` set to `true`
+       :member:`Solver::Options::use_inner_iterations` set to ``true``
        on a :class:`Problem` with a non-null evaluation callback is an
        error.
 
-.. function:: ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, const vector<double*> parameter_blocks)
+.. function:: ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, const std::vector<double*> parameter_blocks)
 
 .. function:: template <typename Ts...> ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, double* x0, Ts... xs)
 
@@ -1738,7 +2027,7 @@
    parameter blocks it expects. The function checks that these match
    the sizes of the parameter blocks listed in parameter_blocks. The
    program aborts if a mismatch is detected. loss_function can be
-   `nullptr`, in which case the cost of the term is just the squared
+   ``nullptr``, in which case the cost of the term is just the squared
    norm of the residuals.
 
    The parameter blocks may be passed together as a
@@ -1756,11 +2045,12 @@
    keep control over the destruction of these objects, then they can
    do this by setting the corresponding enums in the Options struct.
 
-   Note: Even though the Problem takes ownership of cost_function
-   and loss_function, it does not preclude the user from re-using
-   them in another residual block. The destructor takes care to call
-   delete on each cost_function or loss_function pointer only once,
-   regardless of how many residual blocks refer to them.
+   .. note::
+       Even though the Problem takes ownership of ``cost_function``
+       and ``loss_function``, it does not preclude the user from re-using
+       them in another residual block. The destructor takes care to call
+       delete on each cost_function or loss_function pointer only once,
+       regardless of how many residual blocks refer to them.
 
    Example usage:
 
@@ -1769,9 +2059,9 @@
       double x1[] = {1.0, 2.0, 3.0};
       double x2[] = {1.0, 2.0, 5.0, 6.0};
       double x3[] = {3.0, 6.0, 2.0, 5.0, 1.0};
-      vector<double*> v1;
+      std::vector<double*> v1;
       v1.push_back(x1);
-      vector<double*> v2;
+      std::vector<double*> v2;
       v2.push_back(x2);
       v2.push_back(x1);
 
@@ -1782,12 +2072,19 @@
       problem.AddResidualBlock(new MyUnaryCostFunction(...), nullptr, v1);
       problem.AddResidualBlock(new MyBinaryCostFunction(...), nullptr, v2);
 
-.. function:: void Problem::AddParameterBlock(double* values, int size, LocalParameterization* local_parameterization)
+.. function:: void Problem::AddParameterBlock(double* values, int size, Manifold* manifold)
 
-   Add a parameter block with appropriate size to the problem.
+   Add a parameter block with appropriate size and Manifold to the
+   problem. It is okay for ``manifold`` to be ``nullptr``.
+
    Repeated calls with the same arguments are ignored. Repeated calls
-   with the same double pointer but a different size results in
-   undefined behavior.
+   with the same double pointer but a different size results in a crash
+   (unless :member:`Solver::Options::disable_all_safety_checks` is set to true).
+
+   Repeated calls with the same double pointer and size but different
+   :class:`Manifold` is equivalent to calling `SetManifold(manifold)`,
+   i.e., any previously associated :class:`Manifold` object will be replaced
+   with the `manifold`.
 
 .. function:: void Problem::AddParameterBlock(double* values, int size)
 
@@ -1798,35 +2095,46 @@
 
 .. function:: void Problem::RemoveResidualBlock(ResidualBlockId residual_block)
 
-   Remove a residual block from the problem. Any parameters that the residual
-   block depends on are not removed. The cost and loss functions for the
-   residual block will not get deleted immediately; won't happen until the
-   problem itself is deleted.  If Problem::Options::enable_fast_removal is
-   true, then the removal is fast (almost constant time). Otherwise, removing a
-   residual block will incur a scan of the entire Problem object to verify that
-   the residual_block represents a valid residual in the problem.
+   Remove a residual block from the problem.
 
-   **WARNING:** Removing a residual or parameter block will destroy
-   the implicit ordering, rendering the jacobian or residuals returned
-   from the solver uninterpretable. If you depend on the evaluated
-   jacobian, do not use remove! This may change in a future release.
-   Hold the indicated parameter block constant during optimization.
+   Since residual blocks are allowed to share cost function and loss
+   function objects, Ceres Solver uses a reference counting
+   mechanism. So when a residual block is deleted, the reference count
+   for the corresponding cost function and loss function objects are
+   decreased and when this count reaches zero, they are deleted.
+
+   If :member:`Problem::Options::enable_fast_removal` is ``true``, then the removal
+   is fast (almost constant time). Otherwise it is linear, requiring a
+   scan of the entire problem.
+
+   Removing a residual block has no effect on the parameter blocks
+   that the problem depends on.
+
+   .. warning::
+       Removing a residual or parameter block will destroy the implicit
+       ordering, rendering the jacobian or residuals returned from the solver
+       uninterpretable. If you depend on the evaluated jacobian, do not use
+       remove! This may change in a future release. Hold the indicated parameter
+       block constant during optimization.
 
 .. function:: void Problem::RemoveParameterBlock(const double* values)
 
-   Remove a parameter block from the problem. The parameterization of
-   the parameter block, if it exists, will persist until the deletion
-   of the problem (similar to cost/loss functions in residual block
-   removal). Any residual blocks that depend on the parameter are also
-   removed, as described above in RemoveResidualBlock().  If
-   Problem::Options::enable_fast_removal is true, then
-   the removal is fast (almost constant time). Otherwise, removing a
-   parameter block will incur a scan of the entire Problem object.
+   Remove a parameter block from the problem. Any residual blocks that
+   depend on the parameter are also removed, as described above in
+   :func:`RemoveResidualBlock()`.
 
-   **WARNING:** Removing a residual or parameter block will destroy
-   the implicit ordering, rendering the jacobian or residuals returned
-   from the solver uninterpretable. If you depend on the evaluated
-   jacobian, do not use remove! This may change in a future release.
+   The manifold of the parameter block, if it exists, will persist until the
+   deletion of the problem.
+
+   If :member:`Problem::Options::enable_fast_removal` is ``true``, then the removal
+   is fast (almost constant time). Otherwise, removing a parameter
+   block will scan the entire Problem.
+
+   .. warning::
+       Removing a residual or parameter block will destroy the implicit
+       ordering, rendering the jacobian or residuals returned from the solver
+       uninterpretable. If you depend on the evaluated jacobian, do not use
+       remove! This may change in a future release.
 
 .. function:: void Problem::SetParameterBlockConstant(const double* values)
 
@@ -1841,23 +2149,34 @@
    Returns ``true`` if a parameter block is set constant, and false
    otherwise. A parameter block may be set constant in two ways:
    either by calling ``SetParameterBlockConstant`` or by associating a
-   ``LocalParameterization`` with a zero dimensional tangent space
-   with it.
+   :class:`Manifold` with a zero dimensional tangent space with it.
 
-.. function:: void Problem::SetParameterization(double* values, LocalParameterization* local_parameterization)
+.. function:: void SetManifold(double* values, Manifold* manifold);
 
-   Set the local parameterization for one of the parameter blocks.
-   The local_parameterization is owned by the Problem by default. It
-   is acceptable to set the same parameterization for multiple
-   parameters; the destructor is careful to delete local
-   parameterizations only once. Calling `SetParameterization` with
-   `nullptr` will clear any previously set parameterization.
+   Set the :class:`Manifold` for the parameter block. Calling
+   :func:`Problem::SetManifold` with ``nullptr`` will clear any
+   previously set :class:`Manifold` for the parameter block.
 
-.. function:: LocalParameterization* Problem::GetParameterization(const double* values) const
+   Repeated calls will result in any previously associated
+   :class:`Manifold` object to be replaced with ``manifold``.
 
-   Get the local parameterization object associated with this
-   parameter block. If there is no parameterization object associated
-   then `nullptr` is returned
+   ``manifold`` is owned by :class:`Problem` by default (See
+   :class:`Problem::Options` to override this behaviour).
+
+   It is acceptable to set the same :class:`Manifold` for multiple
+   parameter blocks.
+
+.. function:: const Manifold* GetManifold(const double* values) const;
+
+   Get the :class:`Manifold` object associated with this parameter block.
+
+   If there is no :class:`Manifold` object associated with the parameter block,
+   then ``nullptr`` is returned.
+
+.. function:: bool HasManifold(const double* values) const;
+
+   Returns ``true`` if a :class:`Manifold` is associated with this parameter
+   block, ``false`` otherwise.
 
 .. function:: void Problem::SetParameterLowerBound(double* values, int index, double lower_bound)
 
@@ -1909,43 +2228,42 @@
 
    The size of the parameter block.
 
-.. function:: int Problem::ParameterBlockLocalSize(const double* values) const
+.. function:: int Problem::ParameterBlockTangentSize(const double* values) const
 
-   The size of local parameterization for the parameter block. If
-   there is no local parameterization associated with this parameter
-   block, then ``ParameterBlockLocalSize`` = ``ParameterBlockSize``.
+   The dimension of the tangent space of the :class:`Manifold` for the
+   parameter block. If there is no :class:`Manifold` associated with this
+   parameter block, then ``ParameterBlockTangentSize = ParameterBlockSize``.
 
 .. function:: bool Problem::HasParameterBlock(const double* values) const
 
    Is the given parameter block present in the problem or not?
 
-.. function:: void Problem::GetParameterBlocks(vector<double*>* parameter_blocks) const
+.. function:: void Problem::GetParameterBlocks(std::vector<double*>* parameter_blocks) const
 
    Fills the passed ``parameter_blocks`` vector with pointers to the
    parameter blocks currently in the problem. After this call,
    ``parameter_block.size() == NumParameterBlocks``.
 
-.. function:: void Problem::GetResidualBlocks(vector<ResidualBlockId>* residual_blocks) const
+.. function:: void Problem::GetResidualBlocks(std::vector<ResidualBlockId>* residual_blocks) const
 
    Fills the passed `residual_blocks` vector with pointers to the
    residual blocks currently in the problem. After this call,
    `residual_blocks.size() == NumResidualBlocks`.
 
-.. function:: void Problem::GetParameterBlocksForResidualBlock(const ResidualBlockId residual_block, vector<double*>* parameter_blocks) const
+.. function:: void Problem::GetParameterBlocksForResidualBlock(const ResidualBlockId residual_block, std::vector<double*>* parameter_blocks) const
 
    Get all the parameter blocks that depend on the given residual
    block.
 
-.. function:: void Problem::GetResidualBlocksForParameterBlock(const double* values, vector<ResidualBlockId>* residual_blocks) const
+.. function:: void Problem::GetResidualBlocksForParameterBlock(const double* values, std::vector<ResidualBlockId>* residual_blocks) const
 
    Get all the residual blocks that depend on the given parameter
    block.
 
-   If `Problem::Options::enable_fast_removal` is
-   `true`, then getting the residual blocks is fast and depends only
+   If :member:`Problem::Options::enable_fast_removal` is
+   ``true``, then getting the residual blocks is fast and depends only
    on the number of residual blocks. Otherwise, getting the residual
-   blocks for a parameter block will incur a scan of the entire
-   :class:`Problem` object.
+   blocks for a parameter block will scan the entire problem.
 
 .. function:: const CostFunction* Problem::GetCostFunctionForResidualBlock(const ResidualBlockId residual_block) const
 
@@ -1974,11 +2292,10 @@
    function returns false, the caller should expect the output
    memory locations to have been modified.
 
-   The returned cost and jacobians have had robustification and local
-   parameterizations applied already; for example, the jacobian for a
-   4-dimensional quaternion parameter using the
-   :class:`QuaternionParameterization` is ``num_residuals x 3``
-   instead of ``num_residuals x 4``.
+   The returned cost and jacobians have had robustification and
+   :class:`Manifold` applied already; for example, the jacobian for a
+   4-dimensional quaternion parameter using the :class:`QuaternionManifold` is
+   ``num_residuals x 3`` instead of ``num_residuals x 4``.
 
    ``apply_loss_function`` as the name implies allows the user to
    switch the application of the loss function on and off.
@@ -2014,10 +2331,10 @@
     :func:`Problem::EvaluateResidualBlock`).
 
 
-.. function:: bool Problem::Evaluate(const Problem::EvaluateOptions& options, double* cost, vector<double>* residuals, vector<double>* gradient, CRSMatrix* jacobian)
+.. function:: bool Problem::Evaluate(const Problem::EvaluateOptions& options, double* cost, std::vector<double>* residuals, std::vector<double>* gradient, CRSMatrix* jacobian)
 
    Evaluate a :class:`Problem`. Any of the output pointers can be
-   `nullptr`. Which residual blocks and parameter blocks are used is
+   ``nullptr``. Which residual blocks and parameter blocks are used is
    controlled by the :class:`Problem::EvaluateOptions` struct below.
 
    .. NOTE::
@@ -2048,10 +2365,10 @@
 
    .. NOTE::
 
-      If no local parameterizations are used, then the size of
-      the gradient vector is the sum of the sizes of all the parameter
-      blocks. If a parameter block has a local parameterization, then
-      it contributes "LocalSize" entries to the gradient vector.
+      If no :class:`Manifold` are used, then the size of the gradient vector is
+      the sum of the sizes of all the parameter blocks. If a parameter block has
+      a manifold then it contributes "TangentSize" entries to the gradient
+      vector.
 
    .. NOTE::
 
@@ -2070,7 +2387,7 @@
 
    Options struct that is used to control :func:`Problem::Evaluate`.
 
-.. member:: vector<double*> Problem::EvaluateOptions::parameter_blocks
+.. member:: std::vector<double*> Problem::EvaluateOptions::parameter_blocks
 
    The set of parameter blocks for which evaluation should be
    performed. This vector determines the order in which parameter
@@ -2086,7 +2403,7 @@
    should NOT point to new memory locations. Bad things will happen if
    you do.
 
-.. member:: vector<ResidualBlockId> Problem::EvaluateOptions::residual_blocks
+.. member:: std::vector<ResidualBlockId> Problem::EvaluateOptions::residual_blocks
 
    The set of residual blocks for which evaluation should be
    performed. This vector determines the order in which the residuals
@@ -2105,7 +2422,7 @@
 
 .. member:: int Problem::EvaluateOptions::num_threads
 
-   Number of threads to use. (Requires OpenMP).
+   Number of threads to use.
 
 
 :class:`EvaluationCallback`
@@ -2120,9 +2437,9 @@
 
       class EvaluationCallback {
        public:
-        virtual ~EvaluationCallback() {}
-        virtual void PrepareForEvaluation()(bool evaluate_jacobians
-                                            bool new_evaluation_point) = 0;
+        virtual ~EvaluationCallback();
+        virtual void PrepareForEvaluation(bool evaluate_jacobians,
+                                          bool new_evaluation_point) = 0;
       };
 
 .. function:: void EvaluationCallback::PrepareForEvaluation(bool evaluate_jacobians, bool new_evaluation_point)
@@ -2131,14 +2448,14 @@
    every time, and once before it computes the residuals and/or the
    Jacobians.
 
-   User parameters (the double* values provided by the us) are fixed
+   User parameters (the double* values provided by the user) are fixed
    until the next call to
    :func:`EvaluationCallback::PrepareForEvaluation`. If
    ``new_evaluation_point == true``, then this is a new point that is
    different from the last evaluated point. Otherwise, it is the same
    point that was evaluated previously (either Jacobian or residual)
    and the user can use cached results from previous evaluations. If
-   ``evaluate_jacobians`` is true, then Ceres will request Jacobians
+   ``evaluate_jacobians`` is ``true``, then Ceres will request Jacobians
    in the upcoming cost evaluation.
 
    Using this callback interface, Ceres can notify you when it is
@@ -2165,7 +2482,10 @@
    to use a global shared variable (discouraged; bug-prone).  As far
    as Ceres is concerned, it is evaluating cost functions like any
    other; it just so happens that behind the scenes the cost functions
-   reuse pre-computed data to execute faster.
+   reuse pre-computed data to execute faster. See
+   `examples/evaluation_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/evaluation_callback_example.cc>`_
+   for an example.
 
    See ``evaluation_callback_test.cc`` for code that explicitly
    verifies the preconditions between
@@ -2207,14 +2527,14 @@
 .. function:: template <typename T> void RotationMatrixToAngleAxis(T const * R, T * angle_axis)
 .. function:: template <typename T> void AngleAxisToRotationMatrix(T const * angle_axis, T * R)
 
-   Conversions between 3x3 rotation matrix with given column and row strides and
+   Conversions between :math:`3\times3` rotation matrix with given column and row strides and
    axis-angle rotation representations. The functions that take a pointer to T instead
    of a MatrixAdapter assume a column major representation with unit row stride and a column stride of 3.
 
 .. function:: template <typename T, int row_stride, int col_stride> void EulerAnglesToRotationMatrix(const T* euler, const MatrixAdapter<T, row_stride, col_stride>& R)
 .. function:: template <typename T> void EulerAnglesToRotationMatrix(const T* euler, int row_stride, T* R)
 
-   Conversions between 3x3 rotation matrix with given column and row strides and
+   Conversions between :math:`3\times3` rotation matrix with given column and row strides and
    Euler angle (in degrees) rotation representations.
 
    The {pitch,roll,yaw} Euler angles are rotations around the {x,y,z}
@@ -2228,7 +2548,7 @@
 .. function:: template <typename T, int row_stride, int col_stride> void QuaternionToScaledRotation(const T q[4], const MatrixAdapter<T, row_stride, col_stride>& R)
 .. function:: template <typename T> void QuaternionToScaledRotation(const T q[4], T R[3 * 3])
 
-   Convert a 4-vector to a 3x3 scaled rotation matrix.
+   Convert a 4-vector to a :math:`3\times3` scaled rotation matrix.
 
    The choice of rotation is such that the quaternion
    :math:`\begin{bmatrix} 1 &0 &0 &0\end{bmatrix}` goes to an identity
@@ -2242,8 +2562,8 @@
 
    which corresponds to a Rodrigues approximation, the last matrix
    being the cross-product matrix of :math:`\begin{bmatrix} a& b&
-   c\end{bmatrix}`. Together with the property that :math:`R(q1 * q2)
-   = R(q1) * R(q2)` this uniquely defines the mapping from :math:`q` to
+   c\end{bmatrix}`. Together with the property that :math:`R(q_1 \otimes q_2)
+   = R(q_1) R(q_2)` this uniquely defines the mapping from :math:`q` to
    :math:`R`.
 
    In the function that accepts a pointer to T instead of a MatrixAdapter,
@@ -2252,7 +2572,7 @@
 
    No normalization of the quaternion is performed, i.e.
    :math:`R = \|q\|^2  Q`, where :math:`Q` is an orthonormal matrix
-   such that :math:`\det(Q) = 1` and :math:`Q*Q' = I`.
+   such that :math:`\det(Q) = 1` and :math:`QQ' = I`.
 
 
 .. function:: template <typename T> void QuaternionToRotation(const T q[4], const MatrixAdapter<T, row_stride, col_stride>& R)
@@ -2279,9 +2599,9 @@
 
 .. function:: template <typename T> void QuaternionProduct(const T z[4], const T w[4], T zw[4])
 
-   .. math:: zw = z * w
+   .. math:: zw = z \otimes w
 
-   where :math:`*` is the Quaternion product between 4-vectors.
+   where :math:`\otimes` is the Quaternion product between 4-vectors.
 
 
 .. function:: template <typename T> void CrossProduct(const T x[3], const T y[3], T x_cross_y[3])
diff --git a/docs/source/nnls_solving.rst b/docs/source/nnls_solving.rst
index 285df3a..184b94b 100644
--- a/docs/source/nnls_solving.rst
+++ b/docs/source/nnls_solving.rst
@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -12,10 +13,10 @@
 Introduction
 ============
 
-Effective use of Ceres requires some familiarity with the basic
+Effective use of Ceres Solver requires some familiarity with the basic
 components of a non-linear least squares solver, so before we describe
 how to configure and use the solver, we will take a brief look at how
-some of the core optimization algorithms in Ceres work.
+some of the core optimization algorithms in Ceres Solver work.
 
 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
 variables, and
@@ -27,15 +28,15 @@
           L \le x \le U
   :label: nonlinsq
 
-Where, :math:`L` and :math:`U` are lower and upper bounds on the
-parameter vector :math:`x`.
+Where, :math:`L` and :math:`U` are vector lower and upper bounds on
+the parameter vector :math:`x`. The inequality holds component-wise.
 
 Since the efficient global minimization of :eq:`nonlinsq` for
 general :math:`F(x)` is an intractable problem, we will have to settle
 for finding a local minimum.
 
 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
-:math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
+:math:`m\times n` matrix, where :math:`J_{ij}(x) = D_j f_i(x)`
 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
 = J(x)^\top F(x)`.
 
@@ -75,7 +76,7 @@
 Trust region methods are in some sense dual to line search methods:
 trust region methods first choose a step size (the size of the trust
 region) and then a step direction while line search methods first
-choose a step direction and then a step size. Ceres implements
+choose a step direction and then a step size. Ceres Solver implements
 multiple algorithms in both categories.
 
 .. _section-trust-region-methods:
@@ -152,8 +153,9 @@
 solving an unconstrained optimization of the form
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
+   :label: lsqr-naive
 
-Where, :math:`\lambda` is a Lagrange multiplier that is inverse
+Where, :math:`\lambda` is a Lagrange multiplier that is inversely
 related to :math:`\mu`. In Ceres, we solve for
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
@@ -162,39 +164,47 @@
 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
 
-Before going further, let us make some notational simplifications. We
-will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been concatenated
-at the bottom of the matrix :math:`J` and similarly a vector of zeros
-has been added to the bottom of the vector :math:`f` and the rest of
-our discussion will be in terms of :math:`J` and :math:`F`, i.e, the
-linear least squares problem.
+Before going further, let us make some notational simplifications.
+
+We will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been
+concatenated at the bottom of the matrix :math:`J(x)` and a
+corresponding vector of zeroes has been added to the bottom of
+:math:`F(x)`, i.e.:
+
+.. math:: J(x) = \begin{bmatrix} J(x) \\ \frac{1}{\sqrt{\mu}} D
+          \end{bmatrix},\quad F(x) = \begin{bmatrix} F(x) \\ 0
+          \end{bmatrix}.
+
+This allows us to re-write :eq:`lsqr` as
 
 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple
 
-For all but the smallest problems the solution of :eq:`simple` in
-each iteration of the Levenberg-Marquardt algorithm is the dominant
-computational cost in Ceres. Ceres provides a number of different
-options for solving :eq:`simple`. There are two major classes of
-methods - factorization and iterative.
+and only talk about :math:`J(x)` and :math:`F(x)` going forward.
+
+For all but the smallest problems the solution of :eq:`simple` in each
+iteration of the Levenberg-Marquardt algorithm is the dominant
+computational cost. Ceres provides a number of different options for
+solving :eq:`simple`. There are two major classes of methods -
+factorization and iterative.
 
 The factorization methods are based on computing an exact solution of
-:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
-step Levenberg-Marquardt algorithm. But it is not clear if an exact
-solution of :eq:`lsqr` is necessary at each step of the LM algorithm
-to solve :eq:`nonlinsq`. In fact, we have already seen evidence
-that this may not be the case, as :eq:`lsqr` is itself a regularized
-version of :eq:`linearapprox`. Indeed, it is possible to
-construct non-linear optimization algorithms in which the linearized
-problem is solved approximately. These algorithms are known as inexact
-Newton or truncated Newton methods [NocedalWright]_.
+:eq:`lsqr` using a Cholesky or a QR factorization and lead to the so
+called exact step Levenberg-Marquardt algorithm. But it is not clear
+if an exact solution of :eq:`lsqr` is necessary at each step of the
+Levenberg-Mardquardt algorithm.  We have already seen evidence that
+this may not be the case, as :eq:`lsqr` is itself a regularized
+version of :eq:`linearapprox`. Indeed, it is possible to construct
+non-linear optimization algorithms in which the linearized problem is
+solved approximately. These algorithms are known as inexact Newton or
+truncated Newton methods [NocedalWright]_.
 
 An inexact Newton method requires two ingredients. First, a cheap
 method for approximately solving systems of linear
 equations. Typically an iterative linear solver like the Conjugate
-Gradients method is used for this
-purpose [NocedalWright]_. Second, a termination rule for
-the iterative solver. A typical termination rule is of the form
+Gradients method is used for this purpose [NocedalWright]_. Second, a
+termination rule for the iterative solver. A typical termination rule
+is of the form
 
 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
    :label: inexact
@@ -212,14 +222,18 @@
 iterative linear solver, the inexact step Levenberg-Marquardt
 algorithm is used.
 
+We will talk more about the various linear solvers that you can use in
+:ref:`section-linear-solver`.
+
 .. _section-dogleg:
 
 Dogleg
 ------
 
 Another strategy for solving the trust region problem :eq:`trp` was
-introduced by M. J. D. Powell. The key idea there is to compute two
-vectors
+introduced by
+`M. J. D. Powell <https://en.wikipedia.org/wiki/Michael_J._D._Powell>`_. The
+key idea there is to compute two vectors
 
 .. math::
 
@@ -253,10 +267,14 @@
 Levenberg-Marquardt solves the linear approximation from scratch with
 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
 to compute the interpolation between the Gauss-Newton and the Cauchy
-vectors, as neither of them depend on the value of :math:`\mu`.
+vectors, as neither of them depend on the value of :math:`\mu`. As a
+result the Dogleg method only solves one linear system per successful
+step, while Levenberg-Marquardt may need to solve an arbitrary number
+of linear systems before it can make progress [LourakisArgyros]_.
 
-The Dogleg method can only be used with the exact factorization based
-linear solvers.
+A disadvantage of the Dogleg implementation in Ceres Solver is that is
+can only be used with method can only be used with exact factorization
+based linear solvers.
 
 .. _section-inner-iterations:
 
@@ -349,10 +367,10 @@
 enables the non-monotonic trust region algorithm as described by Conn,
 Gould & Toint in [Conn]_.
 
-Even though the value of the objective function may be larger
-than the minimum value encountered over the course of the
-optimization, the final parameters returned to the user are the
-ones corresponding to the minimum cost over all iterations.
+Even though the value of the objective function may be larger than the
+minimum value encountered over the course of the optimization, the
+final parameters returned to the user are the ones corresponding to
+the minimum cost over all iterations.
 
 The option to take non-monotonic steps is available for all trust
 region strategies.
@@ -363,11 +381,13 @@
 Line Search Methods
 ===================
 
-The line search method in Ceres Solver cannot handle bounds
-constraints right now, so it can only be used for solving
-unconstrained problems.
+.. NOTE::
 
-Line search algorithms
+   The line search method in Ceres Solver cannot handle bounds
+   constraints right now, so it can only be used for solving
+   unconstrained problems.
+
+The basic line search algorithm looks something like this:
 
    1. Given an initial point :math:`x`
    2. :math:`\Delta x = -H^{-1}(x) g(x)`
@@ -387,7 +407,7 @@
 direction :math:`\Delta x` and the method used for one dimensional
 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
 primary source of computational complexity in these
-methods. Currently, Ceres Solver supports three choices of search
+methods. Currently, Ceres Solver supports four choices of search
 directions, all aimed at large scale problems.
 
 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
@@ -405,8 +425,8 @@
 3. ``BFGS`` A generalization of the Secant method to multiple
    dimensions in which a full, dense approximation to the inverse
    Hessian is maintained and used to compute a quasi-Newton step
-   [NocedalWright]_.  BFGS is currently the best known general
-   quasi-Newton algorithm.
+   [NocedalWright]_.  ``BFGS`` and its limited memory variant ``LBFGS``
+   are currently the best known general quasi-Newton algorithm.
 
 4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
    method in which the last `M` iterations are used to approximate the
@@ -414,26 +434,31 @@
    [ByrdNocedal]_.
 
 Currently Ceres Solver supports both a backtracking and interpolation
-based Armijo line search algorithm, and a sectioning / zoom
-interpolation (strong) Wolfe condition line search algorithm.
-However, note that in order for the assumptions underlying the
-``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
-Wolfe line search algorithm should be used.
+based `Armijo line search algorithm
+<https://en.wikipedia.org/wiki/Backtracking_line_search>`_ (``ARMIJO``)
+, and a sectioning / zoom interpolation (strong) `Wolfe condition line
+search algorithm <https://en.wikipedia.org/wiki/Wolfe_conditions>`_
+(``WOLFE``).
+
+.. NOTE::
+
+   In order for the assumptions underlying the ``BFGS`` and ``LBFGS``
+   methods to be satisfied the ``WOLFE`` algorithm must be used.
 
 .. _section-linear-solver:
 
-LinearSolver
-============
+Linear Solvers
+==============
 
-Recall that in both of the trust-region methods described above, the
+Observe that for both of the trust-region methods described above, the
 key computational cost is the solution of a linear least squares
 problem of the form
 
-.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
+.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple2
 
 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
-f(x)`. For notational convenience let us also drop the dependence on
+F(x)`. For notational convenience let us also drop the dependence on
 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
 equivalent to solving the *normal equations*.
 
@@ -444,31 +469,65 @@
 
 .. _section-qr:
 
-``DENSE_QR``
-------------
+DENSE_QR
+--------
 
 For small problems (a couple of hundred parameters and a few thousand
-residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
-of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
-:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
-an upper triangular matrix [TrefethenBau]_. Then it can be shown that
-the solution to :eq:`normal` is given by
+residuals) with relatively dense Jacobians, QR-decomposition is the
+method of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition
+of :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R`
+is an upper triangular matrix [TrefethenBau]_. Then it can be shown
+that the solution to :eq:`normal` is given by
 
 .. math:: \Delta x^* = -R^{-1}Q^\top f
 
+You can use QR-decomposition by setting
+:member:`Solver::Options::linear_solver_type` to ``DENSE_QR``.
 
-Ceres uses ``Eigen`` 's dense QR factorization routines.
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen Householder QR factorization
+<https://eigen.tuxfamily.org/dox-devel/classEigen_1_1HouseholderQR.html>`_
+.
 
-.. _section-cholesky:
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DGEQRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Householder
+QR factorization.
 
-``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-------------------------------------------------------
+.. _DGEQRF: https://netlib.org/lapack/explore-html/df/dc5/group__variants_g_ecomputational_ga3766ea903391b5cf9008132f7440ec7b.html
 
-Large non-linear least square problems are usually sparse. In such
-cases, using a dense QR factorization is inefficient. Let :math:`H =
-R^\top R` be the Cholesky factorization of the normal equations, where
-:math:`R` is an upper triangular matrix, then the solution to
-:eq:`normal` is given by
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+QR-decomposition on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial
+speedup. Using CUDA only makes sense for moderate to large sized
+problems. This is because to perform the decomposition on the GPU the
+matrix :math:`J` needs to be transferred from the CPU to the GPU and
+this incurs a cost. So unless the speedup from doing the decomposition
+on the GPU is large enough to also account for the time taken to
+transfer the Jacobian to the GPU, using CUDA will not be better than
+just doing the decomposition on the CPU.
+
+.. _section-dense-normal-cholesky:
+
+DENSE_NORMAL_CHOLESKY
+---------------------
+
+It is often the case that the number of rows in the Jacobian :math:`J`
+are much larger than the the number of columns. The complexity of QR
+factorization scales linearly with the number of rows, so beyond a
+certain size it is more efficient to solve :eq:`normal` using a dense
+`Cholesky factorization
+<https://en.wikipedia.org/wiki/Cholesky_decomposition>`_.
+
+Let :math:`H = R^\top R` be the Cholesky factorization of the normal
+equations, where :math:`R` is an upper triangular matrix, then the
+solution to :eq:`normal` is given by
 
 .. math::
 
@@ -479,69 +538,155 @@
 factorization of :math:`H` is the same upper triangular matrix
 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
-Q^\top Q R = R^\top R`. There are two variants of Cholesky
-factorization -- sparse and dense.
+Q^\top Q R = R^\top R`.
 
-``DENSE_NORMAL_CHOLESKY``  as the name implies performs a dense
-Cholesky factorization of the normal equations. Ceres uses
-``Eigen`` 's dense LDLT factorization routines.
+Unfortunately, forming the matrix :math:`H = J'J` squares the
+condition number. As a result while the cost of forming :math:`H` and
+computing its Cholesky factorization is lower than computing the
+QR-factorization of :math:`J`, we pay the price in terms of increased
+numerical instability and potential failure of the Cholesky
+factorization for ill-conditioned Jacobians.
 
-``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
-Cholesky factorization of the normal equations. This leads to
-substantial savings in time and memory for large sparse
-problems. Ceres uses the sparse Cholesky factorization routines in
-Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
-or the sparse Cholesky factorization algorithm in ``Eigen`` (which
-incidently is a port of the algorithm implemented inside ``CXSparse``)
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``DENSE_NORMAL_CHOLESKY``.
+
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen's LLT factorization`_ routine.
+
+.. _Eigen's LLT Factorization:  https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html
+
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DPOTRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Cholesky
+factorization.
+
+.. _DPOTRF: https://www.netlib.org/lapack/explore-html/d1/d7a/group__double_p_ocomputational_ga2f55f604a6003d03b5cd4a0adcfb74d6.html
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+Cholesky factorization on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial speedup.
+Using CUDA only makes sense for moderate to large sized problems. This
+is because to perform the decomposition on the GPU the matrix
+:math:`H` needs to be transferred from the CPU to the GPU and this
+incurs a cost. So unless the speedup from doing the decomposition on
+the GPU is large enough to also account for the time taken to transfer
+the Jacobian to the GPU, using CUDA will not be better than just doing
+the decomposition on the CPU.
+
+
+.. _section-sparse-normal-cholesky:
+
+SPARSE_NORMAL_CHOLESKY
+----------------------
+
+Large non-linear least square problems are usually sparse. In such
+cases, using a dense QR or Cholesky factorization is inefficient. For
+such problems, Cholesky factorization routines which treat :math:`H`
+as a sparse matrix and computes a sparse factor :math:`R` are better
+suited [Davis]_. This can lead to substantial savings in memory and
+CPU time for large sparse problems.
+
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``SPARSE_NORMAL_CHOLESKY``.
+
+The use of this linear solver requires that Ceres is compiled with
+support for at least one of:
+
+ 1. `SuiteSparse <https://people.engr.tamu.edu/davis/suitesparse.html>`_ (``SUITE_SPARSE``).
+ 2. `Apple's Accelerate framework
+    <https://developer.apple.com/documentation/accelerate/sparse_solvers?language=objc>`_
+    (``ACCELERATE_SPARSE``).
+ 3. `Eigen's sparse linear solvers
+    <https://eigen.tuxfamily.org/dox/group__SparseCholesky__Module.html>`_
+    (``EIGEN_SPARSE``).
+
+SuiteSparse and Accelerate offer high performance sparse Cholesky
+factorization routines as they level-3 BLAS routines
+internally. Eigen's sparse Cholesky routines are *simplicial* and do
+not use dense linear algebra routines and as a result cannot compete
+with SuiteSparse and Accelerate, especially on large problems. As a
+result to get the best performance out of SuiteSparse it should be
+linked to high quality BLAS and LAPACK implementations e.g. `ATLAS
+<https://math-atlas.sourceforge.net/>`_, `OpenBLAS
+<https://www.openblas.net/>`_ or `Intel MKL
+<https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html>`_.
+
+A critical part of a sparse Cholesky factorization routine is the use
+a fill-reducing ordering. By default Ceres Solver uses the Approximate
+Minimum Degree (``AMD``) ordering, which usually performs well, but
+there are other options that may perform better depending on the
+actual sparsity structure of the Jacobian. See :ref:`section-ordering`
+for more details.
 
 .. _section-cgnr:
 
-``CGNR``
---------
+CGNR
+----
 
-For general sparse problems, if the problem is too large for
-``CHOLMOD`` or a sparse linear algebra library is not linked into
-Ceres, another option is the ``CGNR`` solver. This solver uses the
-Conjugate Gradients solver on the *normal equations*, but without
-forming the normal equations explicitly. It exploits the relation
+For general sparse problems, if the problem is too large for sparse
+Cholesky factorization or a sparse linear algebra library is not
+linked into Ceres, another option is the ``CGNR`` solver. This solver
+uses the `Conjugate Gradients
+<https://en.wikipedia.org/wiki/Conjugate_gradient_method>_` method on
+the *normal equations*, but without forming the normal equations
+explicitly. It exploits the relation
 
 .. math::
     H x = J^\top J x = J^\top(J x)
 
-The convergence of Conjugate Gradients depends on the conditioner
-number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
-a :ref:`section-preconditioner` must be used to get reasonable
-performance. Currently only the ``JACOBI`` preconditioner is available
-for use with ``CGNR``. It uses the block diagonal of :math:`H` to
-precondition the normal equations.
+Because ``CGNR`` never solves the linear system exactly, when the user
+chooses ``CGNR`` as the linear solver, Ceres automatically switches
+from the exact step algorithm to an inexact step algorithm. This also
+means that ``CGNR`` can only be used with ``LEVENBERG_MARQUARDT`` and
+not with ``DOGLEG`` trust region strategy.
 
-When the user chooses ``CGNR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+``CGNR`` by default runs on the CPU. However, if an NVIDIA GPU is
+available and Ceres Solver has been built with CUDA support enabled,
+then the user can also choose to run ``CGNR`` on the GPU by setting
+:member:`Solver::Options::sparse_linear_algebra_library_type` to
+``CUDA_SPARSE``. The key complexity of ``CGNR`` comes from evaluating
+the two sparse-matrix vector products (SpMV) :math:`Jx` and
+:math:`J'y`. GPUs are particularly well suited for doing sparse
+matrix-vector products. As a result, for large problems using a GPU
+can lead to a substantial speedup.
+
+The convergence of Conjugate Gradients depends on the conditioner
+number :math:`\kappa(H)`. Usually :math:`H` is quite poorly
+conditioned and a `Preconditioner
+<https://en.wikipedia.org/wiki/Preconditioner>`_ must be used to get
+reasonable performance. See section on :ref:`section-preconditioner`
+for more details.
 
 .. _section-schur:
 
-``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-----------------------------------
+DENSE_SCHUR & SPARSE_SCHUR
+--------------------------
 
 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
-adjustment problems, bundle adjustment problem have a special
-structure, and a more efficient scheme for solving :eq:`normal`
-can be constructed.
+adjustment problems, they have a special sparsity structure that can
+be exploited to solve the normal equations more efficiently.
 
-Suppose that the SfM problem consists of :math:`p` cameras and
-:math:`q` points and the variable vector :math:`x` has the block
-structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
-:math:`y` and :math:`z` correspond to camera and point parameters,
-respectively.  Further, let the camera blocks be of size :math:`c` and
-the point blocks be of size :math:`s` (for most problems :math:`c` =
-:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
-requirement on these block sizes, but choosing them to be constant
-simplifies the exposition.
+Suppose that the bundle adjustment problem consists of :math:`p`
+cameras and :math:`q` points and the variable vector :math:`x` has the
+block structure :math:`x = [y_{1}, ... ,y_{p},z_{1},
+... ,z_{q}]`. Where, :math:`y` and :math:`z` correspond to camera and
+point parameters respectively.  Further, let the camera blocks be of
+size :math:`c` and the point blocks be of size :math:`s` (for most
+problems :math:`c` = :math:`6`--`9` and :math:`s = 3`). Ceres does not
+impose any constancy requirement on these block sizes, but choosing
+them to be constant simplifies the exposition.
 
-A key characteristic of the bundle adjustment problem is that there is
-no term :math:`f_{i}` that includes two or more point blocks.  This in
-turn implies that the matrix :math:`H` is of the form
+The key property of bundle adjustment problems which we will exploit
+is the fact that no term :math:`f_{i}` in :eq:`nonlinsq` includes two
+or more point blocks at the same time.  This in turn implies that the
+matrix :math:`H` is of the form
 
 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
    :label: hblock
@@ -585,123 +730,208 @@
 observe at least one common point.
 
 
-Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
-:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
+Now :eq:`linear2` can be solved by first forming :math:`S`, solving
+for :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
 and matrix-vector multiplies, and the solution of block sparse
 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
 problems, the number of cameras is much smaller than the number of
-points, :math:`p \ll q`, thus solving :eq:`schur` is
-significantly cheaper than solving :eq:`linear2`. This is the
-*Schur complement trick* [Brown]_.
+points, :math:`p \ll q`, thus solving :eq:`schur` is significantly
+cheaper than solving :eq:`linear2`. This is the *Schur complement
+trick* [Brown]_.
 
-This still leaves open the question of solving :eq:`schur`. The
-method of choice for solving symmetric positive definite systems
-exactly is via the Cholesky factorization [TrefethenBau]_ and
-depending upon the structure of the matrix, there are, in general, two
-options. The first is direct factorization, where we store and factor
-:math:`S` as a dense matrix [TrefethenBau]_. This method has
+This still leaves open the question of solving :eq:`schur`. As we
+discussed when considering the exact solution of the normal equations
+using Cholesky factorization, we have two options.
+
+1. ``DENSE_SCHUR`` - The first is **dense Cholesky factorization**,
+where we store and factor :math:`S` as a dense matrix. This method has
 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
-is only practical for problems with up to a few hundred cameras. Ceres
-implements this strategy as the ``DENSE_SCHUR`` solver.
+is only practical for problems with up to a few hundred cameras.
 
-
-But, :math:`S` is typically a fairly sparse matrix, as most images
-only see a small fraction of the scene. This leads us to the second
-option: Sparse Direct Methods. These methods store :math:`S` as a
+2. ``SPARSE_SCHUR`` - For large bundle adjustment problems :math:`S`
+is typically a fairly sparse matrix, as most images only see a small
+fraction of the scene. This leads us to the second option: **sparse
+Cholesky factorization** [Davis]_.  Here we store :math:`S` as a
 sparse matrix, use row and column re-ordering algorithms to maximize
 the sparsity of the Cholesky decomposition, and focus their compute
-effort on the non-zero part of the factorization [Chen]_. Sparse
-direct methods, depending on the exact sparsity structure of the Schur
-complement, allow bundle adjustment algorithms to significantly scale
-up over those based on dense factorization. Ceres implements this
-strategy as the ``SPARSE_SCHUR`` solver.
+effort on the non-zero part of the factorization [Davis]_ [Chen]_
+. Sparse direct methods, depending on the exact sparsity structure of
+the Schur complement, allow bundle adjustment algorithms to scenes
+with thousands of cameras.
+
 
 .. _section-iterative_schur:
 
-``ITERATIVE_SCHUR``
--------------------
+ITERATIVE_SCHUR
+---------------
 
-Another option for bundle adjustment problems is to apply
-Preconditioned Conjugate Gradients to the reduced camera matrix
-:math:`S` instead of :math:`H`. One reason to do this is that
-:math:`S` is a much smaller matrix than :math:`H`, but more
-importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
+Another option for bundle adjustment problems is to apply Conjugate
+Gradients to the reduced camera matrix :math:`S` instead of
+:math:`H`. One reason to do this is that :math:`S` is a much smaller
+matrix than :math:`H`, but more importantly, it can be shown that
+:math:`\kappa(S)\leq \kappa(H)` [Agarwal]_.
+
 Ceres implements Conjugate Gradients on :math:`S` as the
 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
 as the linear solver, Ceres automatically switches from the exact step
 algorithm to an inexact step algorithm.
 
+
 The key computational operation when using Conjuagate Gradients is the
 evaluation of the matrix vector product :math:`Sx` for an arbitrary
-vector :math:`x`. There are two ways in which this product can be
-evaluated, and this can be controlled using
-``Solver::Options::use_explicit_schur_complement``. Depending on the
-problem at hand, the performance difference between these two methods
-can be quite substantial.
+vector :math:`x`. Because PCG only needs access to :math:`S` via its
+product with a vector, one way to evaluate :math:`Sx` is to observe
+that
 
-  1. **Implicit** This is default. Implicit evaluation is suitable for
-     large problems where the cost of computing and storing the Schur
-     Complement :math:`S` is prohibitive. Because PCG only needs
-     access to :math:`S` via its product with a vector, one way to
-     evaluate :math:`Sx` is to observe that
+.. math::  x_1 &= E^\top x\\
+           x_2 &= C^{-1} x_1\\
+           x_3 &= Ex_2\\
+           x_4 &= Bx\\
+           Sx &= x_4 - x_3
+   :label: schurtrick1
 
-     .. math::  x_1 &= E^\top x\\
-                x_2 &= C^{-1} x_1\\
-                x_3 &= Ex_2\\
-                x_4 &= Bx\\
-                Sx &= x_4 - x_3
-        :label: schurtrick1
+Thus, we can run Conjugate Gradients on :math:`S` with the same
+computational effort per iteration as Conjugate Gradients on
+:math:`H`, while reaping the benefits of a more powerful
+preconditioner. In fact, we do not even need to compute :math:`H`,
+:eq:`schurtrick1` can be implemented using just the columns of
+:math:`J`.
 
-     Thus, we can run PCG on :math:`S` with the same computational
-     effort per iteration as PCG on :math:`H`, while reaping the
-     benefits of a more powerful preconditioner. In fact, we do not
-     even need to compute :math:`H`, :eq:`schurtrick1` can be
-     implemented using just the columns of :math:`J`.
+Equation :eq:`schurtrick1` is closely related to *Domain Decomposition
+methods* for solving large linear systems that arise in structural
+engineering and partial differential equations. In the language of
+Domain Decomposition, each point in a bundle adjustment problem is a
+domain, and the cameras form the interface between these domains. The
+iterative solution of the Schur complement then falls within the
+sub-category of techniques known as Iterative Sub-structuring [Saad]_
+[Mathew]_.
 
-     Equation :eq:`schurtrick1` is closely related to *Domain
-     Decomposition methods* for solving large linear systems that
-     arise in structural engineering and partial differential
-     equations. In the language of Domain Decomposition, each point in
-     a bundle adjustment problem is a domain, and the cameras form the
-     interface between these domains. The iterative solution of the
-     Schur complement then falls within the sub-category of techniques
-     known as Iterative Sub-structuring [Saad]_ [Mathew]_.
+While in most cases the above method for evaluating :math:`Sx` is the
+way to go, for some problems it is better to compute the Schur
+complemenent :math:`S` explicitly and then run Conjugate Gradients on
+it. This can be done by settin
+``Solver::Options::use_explicit_schur_complement`` to ``true``. This
+option can only be used with the ``SCHUR_JACOBI`` preconditioner.
 
-  2. **Explicit** The complexity of implicit matrix-vector product
-     evaluation scales with the number of non-zeros in the
-     Jacobian. For small to medium sized problems, the cost of
-     constructing the Schur Complement is small enough that it is
-     better to construct it explicitly in memory and use it to
-     evaluate the product :math:`Sx`.
 
-When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+.. _section-schur_power_series_expansion:
 
-  .. NOTE::
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
 
-     In exact arithmetic, the choice of implicit versus explicit Schur
-     complement would have no impact on solution quality. However, in
-     practice if the Jacobian is poorly conditioned, one may observe
-     (usually small) differences in solution quality. This is a
-     natural consequence of performing computations in finite arithmetic.
+It can be shown that the inverse of the Schur complement can be
+written as an infinite power-series [Weber]_ [Zheng]_:
 
+.. math:: S      &= B - EC^{-1}E^\top\\
+	         &= B(I - B^{-1}EC^{-1}E^\top)\\
+	  S^{-1} &= (I - B^{-1}EC^{-1}E^\top)^{-1} B^{-1}\\
+	         & = \sum_{i=0}^\infty \left(B^{-1}EC^{-1}E^\top\right)^{i} B^{-1}
+
+As a result a truncated version of this power series expansion can be
+used to approximate the inverse and therefore the solution to
+:eq:`schur`. Ceres allows the user to use Schur power series expansion
+in three ways.
+
+1. As a linear solver. This is what [Weber]_ calls **Power Bundle
+   Adjustment** and corresponds to using the truncated power series to
+   approximate the inverse of the Schur complement. This is done by
+   setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = IDENTITY
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      Solver::Options::spse_tolerance = 0.1;
+
+
+2. As a preconditioner for ``ITERATIVE_SCHUR``. Any method for
+   approximating the inverse of a matrix can also be used as a
+   preconditioner. This is enabled by setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = SCHUR_POWER_SERIES_EXPANSION;
+      Solver::Options::use_spse_initialization = false;
+
+      // This is worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+
+
+3. As initialization for ``ITERATIIVE_SCHUR`` with any
+   preconditioner. This is a combination of the above two, where the
+   Schur Power Series Expansion
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = ... // Preconditioner of your choice.
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      // This only affects the initialization but not the preconditioner.
+      Solver::Options::spse_tolerance = 0.1;
+
+
+.. _section-mixed-precision:
+
+Mixed Precision Solves
+======================
+
+Generally speaking Ceres Solver does all its arithmetic in double
+precision. Sometimes though, one can use single precision arithmetic
+to get substantial speedups. Currently, for linear solvers that
+perform Cholesky factorization (sparse or dense) the user has the
+option cast the linear system to single precision and then use
+single precision Cholesky factorization routines to solve the
+resulting linear system. This can be enabled by setting
+:member:`Solver::Options::use_mixed_precision_solves` to ``true``.
+
+Depending on the conditioning of the problem, the use of single
+precision factorization may lead to some loss of accuracy. Some of
+this accuracy can be recovered by performing `Iterative Refinement
+<https://en.wikipedia.org/wiki/Iterative_refinement>`_. The number of
+iterations of iterative refinement are controlled by
+:member:`Solver::Options::max_num_refinement_iterations`. The default
+value of this parameter is zero, which means if
+:member:`Solver::Options::use_mixed_precision_solves` is ``true``,
+then no iterative refinement is performed. Usually 2-3 refinement
+iterations are enough.
+
+Mixed precision solves are available in the following linear solver
+configurations:
+
+1. ``DENSE_NORMAL_CHOLESKY`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+2. ``DENSE_SCHUR`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+3. ``SPARSE_NORMAL_CHOLESKY`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+4. ``SPARSE_SCHUR`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+
+Mixed precision solves area not available when using ``SUITE_SPARSE``
+as the sparse linear algebra backend because SuiteSparse/CHOLMOD does
+not support single precision solves.
 
 .. _section-preconditioner:
 
-Preconditioner
-==============
+Preconditioners
+===============
 
-The convergence rate of Conjugate Gradients for
-solving :eq:`normal` depends on the distribution of eigenvalues
-of :math:`H` [Saad]_. A useful upper bound is
-:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
-number of the matrix :math:`H`. For most bundle adjustment problems,
-:math:`\kappa(H)` is high and a direct application of Conjugate
-Gradients to :eq:`normal` results in extremely poor performance.
+The convergence rate of Conjugate Gradients for solving :eq:`normal`
+depends on the distribution of eigenvalues of :math:`H` [Saad]_. A
+useful upper bound is :math:`\sqrt{\kappa(H)}`, where,
+:math:`\kappa(H)` is the condition number of the matrix :math:`H`. For
+most non-linear least squares problems, :math:`\kappa(H)` is high and
+a direct application of Conjugate Gradients to :eq:`normal` results in
+extremely poor performance.
 
 The solution to this problem is to replace :eq:`normal` with a
 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
@@ -733,8 +963,15 @@
 problems with general sparsity as well as the special sparsity
 structure encountered in bundle adjustment problems.
 
-``JACOBI``
-----------
+IDENTITY
+--------
+
+This is equivalent to using an identity matrix as a preconditioner,
+i.e. no preconditioner at all.
+
+
+JACOBI
+------
 
 The simplest of all preconditioners is the diagonal or Jacobi
 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
@@ -742,24 +979,27 @@
 block Jacobi preconditioner. The ``JACOBI`` preconditioner in Ceres
 when used with :ref:`section-cgnr` refers to the block diagonal of
 :math:`H` and when used with :ref:`section-iterative_schur` refers to
-the block diagonal of :math:`B` [Mandel]_. For detailed performance
-data about the performance of ``JACOBI`` on bundle adjustment problems
-see [Agarwal]_.
+the block diagonal of :math:`B` [Mandel]_.
+
+For detailed performance data about the performance of ``JACOBI`` on
+bundle adjustment problems see [Agarwal]_.
 
 
-``SCHUR_JACOBI``
-----------------
+SCHUR_JACOBI
+------------
 
 Another obvious choice for :ref:`section-iterative_schur` is the block
 diagonal of the Schur complement matrix :math:`S`, i.e, the block
 Jacobi preconditioner for :math:`S`. In Ceres we refer to it as the
-``SCHUR_JACOBI`` preconditioner.  For detailed performance data about
-the performance of ``SCHUR_JACOBI`` on bundle adjustment problems see
-[Agarwal]_.
+``SCHUR_JACOBI`` preconditioner.
 
 
-``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``
-----------------------------------------------
+For detailed performance data about the performance of
+``SCHUR_JACOBI`` on bundle adjustment problems see [Agarwal]_.
+
+
+CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL
+--------------------------------------
 
 For bundle adjustment problems arising in reconstruction from
 community photo collections, more effective preconditioners can be
@@ -790,8 +1030,19 @@
 algorithm. The choice of clustering algorithm is controlled by
 :member:`Solver::Options::visibility_clustering_type`.
 
-``SUBSET``
-----------
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
+
+As explained in :ref:`section-schur_power_series_expansion`, the Schur
+complement matrix admits a power series expansion and a truncated
+version of this power series can be used as a preconditioner for
+``ITERATIVE_SCHUR``. When used as a preconditioner
+:member:`Solver::Options::max_num_spse_iterations` controls the number
+of terms in the power series that are used.
+
+
+SUBSET
+------
 
 This is a  preconditioner for problems with general  sparsity. Given a
 subset  of residual  blocks of  a problem,  it uses  the corresponding
@@ -811,6 +1062,8 @@
 :math:`Q` approximates :math:`J^\top J`, or how well the chosen
 residual blocks approximate the full problem.
 
+This preconditioner is NOT available when running ``CGNR`` using
+``CUDA``.
 
 .. _section-ordering:
 
@@ -821,33 +1074,36 @@
 have a significant of impact on the efficiency and accuracy of the
 method. For example when doing sparse Cholesky factorization, there
 are matrices for which a good ordering will give a Cholesky factor
-with :math:`O(n)` storage, where as a bad ordering will result in an
+with :math:`O(n)` storage, whereas a bad ordering will result in an
 completely dense factor.
 
 Ceres allows the user to provide varying amounts of hints to the
 solver about the variable elimination ordering to use. This can range
-from no hints, where the solver is free to decide the best ordering
-based on the user's choices like the linear solver being used, to an
-exact order in which the variables should be eliminated, and a variety
-of possibilities in between.
+from no hints, where the solver is free to decide the best possible
+ordering based on the user's choices like the linear solver being
+used, to an exact order in which the variables should be eliminated,
+and a variety of possibilities in between.
 
-Instances of the :class:`ParameterBlockOrdering` class are used to
-communicate this information to Ceres.
+The simplest thing to do is to just set
+:member:`Solver::Options::linear_solver_ordering_type` to ``AMD``
+(default) or ``NESDIS`` based on your understanding of the problem or
+empirical testing.
 
-Formally an ordering is an ordered partitioning of the parameter
-blocks. Each parameter block belongs to exactly one group, and each
-group has a unique integer associated with it, that determines its
-order in the set of groups. We call these groups *Elimination Groups*
 
-Given such an ordering, Ceres ensures that the parameter blocks in the
-lowest numbered elimination group are eliminated first, and then the
-parameter blocks in the next lowest numbered elimination group and so
-on. Within each elimination group, Ceres is free to order the
-parameter blocks as it chooses. For example, consider the linear system
+More information can be commmuniucated by using an instance
+:class:`ParameterBlockOrdering` class.
+
+Formally an ordering is an ordered partitioning of the
+parameter blocks, i.e, each parameter block belongs to exactly
+one group, and each group has a unique non-negative integer
+associated with it, that determines its order in the set of
+groups.
+
+e.g. Consider the linear system
 
 .. math::
-  x + y &= 3\\
-  2x + 3y &= 7
+   x + y &= 3 \\
+   2x + 3y &= 7
 
 There are two ways in which it can be solved. First eliminating
 :math:`x` from the two equations, solving for :math:`y` and then back
@@ -855,33 +1111,92 @@
 for :math:`x` and back substituting for :math:`y`. The user can
 construct three orderings here.
 
-1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
-2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
-3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
+1. :math:`\{0: x\}, \{1: y\}` - eliminate :math:`x` first.
+2. :math:`\{0: y\}, \{1: x\}` - eliminate :math:`y` first.
+3. :math:`\{0: x, y\}` - Solver gets to decide the elimination order.
 
-Thus, to have Ceres determine the ordering automatically using
-heuristics, put all the variables in the same elimination group. The
-identity of the group does not matter. This is the same as not
-specifying an ordering at all. To control the ordering for every
-variable, create an elimination group per variable, ordering them in
-the desired order.
+Thus, to have Ceres determine the ordering automatically, put all the
+variables in group 0 and to control the ordering for every variable,
+create groups :math:`0 \dots N-1`, one per variable, in the desired
+order.
+
+``linear_solver_ordering == nullptr`` and an ordering where all the
+parameter blocks are in one elimination group mean the same thing -
+the solver is free to choose what it thinks is the best elimination
+ordering using the ordering algorithm (specified using
+:member:`Solver::Options::linear_solver_ordering_type`). Therefore in
+the following we will only consider the case where
+``linear_solver_ordering != nullptr``.
+
+The exact interpretation of the ``linear_solver_ordering`` depends on
+the values of :member:`Solver::Options::linear_solver_ordering_type`,
+:member:`Solver::Options::linear_solver_type`,
+:member:`Solver::Options::preconditioner_type` and
+:member:`Solver::Options::sparse_linear_algebra_library_type` as we will
+explain below.
+
+Bundle Adjustment
+-----------------
 
 If the user is using one of the Schur solvers (``DENSE_SCHUR``,
 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
 ordering, it must have one important property. The lowest numbered
 elimination group must form an independent set in the graph
 corresponding to the Hessian, or in other words, no two parameter
-blocks in in the first elimination group should co-occur in the same
+blocks in the first elimination group should co-occur in the same
 residual block. For the best performance, this elimination group
 should be as large as possible. For standard bundle adjustment
 problems, this corresponds to the first elimination group containing
-all the 3d points, and the second containing the all the cameras
-parameter blocks.
+all the 3d points, and the second containing the parameter blocks for
+all the cameras.
 
 If the user leaves the choice to Ceres, then the solver uses an
 approximate maximum independent set algorithm to identify the first
 elimination group [LiSaad]_.
 
+``sparse_linear_algebra_library_type = SUITE_SPARSE``
+-----------------------------------------------------
+
+**linear_solver_ordering_type = AMD**
+
+A constrained Approximate Minimum Degree (CAMD) ordering is used where
+the parameter blocks in the lowest numbered group are eliminated
+first, and then the parameter blocks in the next lowest numbered group
+and so on. Within each group, CAMD is free to order the parameter blocks
+as it chooses.
+
+**linear_solver_ordering_type = NESDIS**
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and a Nested
+   Dissection algorithm is used to compute a fill reducing ordering.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   Nested Dissection is used to compute a fill reducing ordering for
+   the Schur Complement (or its preconditioner).
+
+``sparse_linear_algebra_library_type = EIGEN_SPARSE/ACCELERATE_SPARSE``
+-----------------------------------------------------------------------
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and ``AMD`` or
+   ``NESDIS`` is used to compute a fill reducing ordering as requested
+   by the user.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   ``AMD`` or ``NESID`` is used to compute a fill reducing ordering
+   for the Schur Complement (or its preconditioner) as requested by
+   the user.
+
+
 .. _section-solver-options:
 
 :class:`Solver::Options`
@@ -892,7 +1207,7 @@
    :class:`Solver::Options` controls the overall behavior of the
    solver. We list the various settings and their default values below.
 
-.. function:: bool Solver::Options::IsValid(string* error) const
+.. function:: bool Solver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -913,14 +1228,18 @@
    Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
    ``BFGS`` and ``LBFGS``.
 
+   See :ref:`section-line-search-methods` for more details.
+
 .. member:: LineSearchType Solver::Options::line_search_type
 
    Default: ``WOLFE``
 
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
-   ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   ``LBFGS`` line search direction algorithms to be satisfied, the
+   ``WOLFE`` line search must be used.
+
+   See :ref:`section-line-search-methods` for more details.
 
 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
 
@@ -931,26 +1250,28 @@
 
 .. member:: int Solver::Options::max_lbfgs_rank
 
-   Default: 20
+   Default: ``20``
 
-   The L-BFGS hessian approximation is a low rank approximation to the
-   inverse of the Hessian matrix. The rank of the approximation
-   determines (linearly) the space and time complexity of using the
-   approximation. Higher the rank, the better is the quality of the
-   approximation. The increase in quality is however is bounded for a
-   number of reasons.
+   The LBFGS hessian approximation is a low rank approximation to
+   the inverse of the Hessian matrix. The rank of the
+   approximation determines (linearly) the space and time
+   complexity of using the approximation. Higher the rank, the
+   better is the quality of the approximation. The increase in
+   quality is however is bounded for a number of reasons.
 
-     1. The method only uses secant information and not actual
-        derivatives.
+   1. The method only uses secant information and not actual
+   derivatives.
+   2. The Hessian approximation is constrained to be positive
+   definite.
 
-     2. The Hessian approximation is constrained to be positive
-        definite.
+   So increasing this rank to a large number will cost time and
+   space complexity without the corresponding increase in solution
+   quality. There are no hard and fast rules for choosing the
+   maximum rank. The best choice usually requires some problem
+   specific experimentation.
 
-   So increasing this rank to a large number will cost time and space
-   complexity without the corresponding increase in solution
-   quality. There are no hard and fast rules for choosing the maximum
-   rank. The best choice usually requires some problem specific
-   experimentation.
+   For more theoretical and implementation details of the LBFGS
+   method, please see [Nocedal]_.
 
 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
 
@@ -999,6 +1320,8 @@
 
 .. member:: double Solver::Options::min_line_search_step_size
 
+   Default: ``1e-9``
+
    The line search terminates if:
 
    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
@@ -1153,7 +1476,8 @@
 
 .. member:: double Solver::Options::max_solver_time_in_seconds
 
-   Default: ``1e6``
+   Default: ``1e9``
+
    Maximum amount of time for which the solver should run.
 
 .. member:: int Solver::Options::num_threads
@@ -1239,8 +1563,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the overall manifold associated with the
+   parameter vector.
 
 .. member:: double Solver::Options::parameter_tolerance
 
@@ -1260,7 +1584,7 @@
    Type of linear solver used to compute the solution to the linear
    least squares problem in each iteration of the Levenberg-Marquardt
    algorithm. If Ceres is built with support for ``SuiteSparse`` or
-   ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
+   ``Accelerate`` or ``Eigen``'s sparse Cholesky factorization, the
    default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
    otherwise.
 
@@ -1271,8 +1595,9 @@
    The preconditioner used by the iterative linear solver. The default
    is the block Jacobi preconditioner. Valid values are (in increasing
    order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
-   ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
-   :ref:`section-preconditioner` for more details.
+   ``CLUSTER_JACOBI``, ``CLUSTER_TRIDIAGONAL``, ``SUBSET`` and
+   ``SCHUR_POWER_SERIES_EXPANSION``. See :ref:`section-preconditioner`
+   for more details.
 
 .. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
 
@@ -1296,7 +1621,7 @@
    recommend that you try ``CANONICAL_VIEWS`` first and if it is too
    expensive try ``SINGLE_LINKAGE``.
 
-.. member:: std::unordered_set<ResidualBlockId> residual_blocks_for_subset_preconditioner
+.. member:: std::unordered_set<ResidualBlockId> Solver::Options::residual_blocks_for_subset_preconditioner
 
    ``SUBSET`` preconditioner is a preconditioner for problems with
    general sparsity. Given a subset of residual blocks of a problem,
@@ -1321,65 +1646,76 @@
 
 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
 
-   Default:``EIGEN``
+   Default: ``EIGEN``
 
    Ceres supports using multiple dense linear algebra libraries for
-   dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
-   the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
-   to the system ``BLAS + LAPACK`` library which may or may not be
-   available.
+   dense matrix factorizations. Currently ``EIGEN``, ``LAPACK`` and
+   ``CUDA`` are the valid choices. ``EIGEN`` is always available,
+   ``LAPACK`` refers to the system ``BLAS + LAPACK`` library which may
+   or may not be available. ``CUDA`` refers to Nvidia's GPU based
+   dense linear algebra library which may or may not be available.
 
    This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
-   and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
+   and ``DENSE_SCHUR`` solvers. For small to moderate sized problem
    ``EIGEN`` is a fine choice but for large problems, an optimized
-   ``LAPACK + BLAS`` implementation can make a substantial difference
-   in performance.
+   ``LAPACK + BLAS`` or ``CUDA`` implementation can make a substantial
+   difference in performance.
 
 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
 
    Default: The highest available according to: ``SUITE_SPARSE`` >
-   ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
+   ``ACCELERATE_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
 
    Ceres supports the use of three sparse linear algebra libraries,
    ``SuiteSparse``, which is enabled by setting this parameter to
-   ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
-   this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
-   setting this parameter to ``EIGEN_SPARSE``.  Lastly, ``NO_SPARSE``
-   means that no sparse linear solver should be used; note that this is
-   irrespective of whether Ceres was compiled with support for one.
+   ``SUITE_SPARSE``, ``Acclerate``, which can be selected by setting
+   this parameter to ``ACCELERATE_SPARSE`` and ``Eigen`` which is
+   enabled by setting this parameter to ``EIGEN_SPARSE``.  Lastly,
+   ``NO_SPARSE`` means that no sparse linear solver should be used;
+   note that this is irrespective of whether Ceres was compiled with
+   support for one.
 
-   ``SuiteSparse`` is a sophisticated and complex sparse linear
-   algebra library and should be used in general.
+   ``SuiteSparse`` is a sophisticated sparse linear algebra library
+   and should be used in general. On MacOS you may want to use the
+   ``Accelerate`` framework.
 
    If your needs/platforms prevent you from using ``SuiteSparse``,
-   consider using ``CXSparse``, which is a much smaller, easier to
-   build library. As can be expected, its performance on large
-   problems is not comparable to that of ``SuiteSparse``.
+   consider using the sparse linear algebra routines in ``Eigen``. The
+   sparse Cholesky algorithms currently included with ``Eigen`` are
+   not as sophisticated as the ones in ``SuiteSparse`` and
+   ``Accelerate`` and as a result its performance is considerably
+   worse.
 
-   Last but not the least you can use the sparse linear algebra
-   routines in ``Eigen``. Currently the performance of this library is
-   the poorest of the three. But this should change in the near
-   future.
+.. member:: LinearSolverOrderingType Solver::Options::linear_solver_ordering_type
 
-   Another thing to consider here is that the sparse Cholesky
-   factorization libraries in Eigen are licensed under ``LGPL`` and
-   building Ceres with support for ``EIGEN_SPARSE`` will result in an
-   LGPL licensed library (since the corresponding code from Eigen is
-   compiled into the library).
+   Default: ``AMD``
 
-   The upside is that you do not need to build and link to an external
-   library to use ``EIGEN_SPARSE``.
+    The order in which variables are eliminated in a linear solver can
+    have a significant impact on the efficiency and accuracy of the
+    method. e.g., when doing sparse Cholesky factorization, there are
+    matrices for which a good ordering will give a Cholesky factor
+    with :math:`O(n)` storage, where as a bad ordering will result in
+    an completely dense factor.
 
+    Sparse direct solvers like ``SPARSE_NORMAL_CHOLESKY`` and
+    ``SPARSE_SCHUR`` use a fill reducing ordering of the columns and
+    rows of the matrix being factorized before computing the numeric
+    factorization.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+    This enum controls the type of algorithm used to compute this fill
+    reducing ordering. There is no single algorithm that works on all
+    matrices, so determining which algorithm works better is a matter
+    of empirical experimentation.
 
-   Default: ``NULL``
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+
+   Default: ``nullptr``
 
    An instance of the ordering object informs the solver about the
    desired order in which parameter blocks should be eliminated by the
-   linear solvers. See section~\ref{sec:ordering`` for more details.
+   linear solvers.
 
-   If ``NULL``, the solver is free to choose an ordering that it
+   If ``nullptr``, the solver is free to choose an ordering that it
    thinks is best.
 
    See :ref:`section-ordering` for more details.
@@ -1404,41 +1740,15 @@
    efficient to explicitly compute it and use it for evaluating the
    matrix-vector products.
 
-   Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
-   computed Schur complement. This can improve the performance of the
-   ``ITERATIVE_SCHUR`` solver significantly.
-
-   .. NOTE:
+   .. NOTE::
 
      This option can only be used with the ``SCHUR_JACOBI``
      preconditioner.
 
-.. member:: bool Solver::Options::use_post_ordering
+.. member:: bool Solver::Options::dynamic_sparsity
 
    Default: ``false``
 
-   Sparse Cholesky factorization algorithms use a fill-reducing
-   ordering to permute the columns of the Jacobian matrix. There are
-   two ways of doing this.
-
-   1. Compute the Jacobian matrix in some order and then have the
-      factorization algorithm permute the columns of the Jacobian.
-
-   2. Compute the Jacobian with its columns already permuted.
-
-   The first option incurs a significant memory penalty. The
-   factorization algorithm has to make a copy of the permuted Jacobian
-   matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
-   and generally speaking, there is no performance penalty for doing
-   so.
-
-   In some rare cases, it is worth using a more complicated reordering
-   algorithm which has slightly better runtime performance at the
-   expense of an extra copy of the Jacobian matrix. Setting
-   ``use_postordering`` to ``true`` enables this tradeoff.
-
-.. member:: bool Solver::Options::dynamic_sparsity
-
    Some non-linear least squares problems are symbolically dense but
    numerically sparse. i.e. at any given state only a small number of
    Jacobian entries are non-zero, but the position and number of
@@ -1453,6 +1763,29 @@
 
    This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
 
+.. member:: bool Solver::Options::use_mixed_precision_solves
+
+   Default: ``false``
+
+   If true, the Gauss-Newton matrix is computed in *double* precision, but
+   its factorization is computed in **single** precision. This can result in
+   significant time and memory savings at the cost of some accuracy in the
+   Gauss-Newton step. Iterative refinement is used to recover some
+   of this accuracy back.
+
+   If ``use_mixed_precision_solves`` is true, we recommend setting
+   ``max_num_refinement_iterations`` to 2-3.
+
+   See :ref:`section-mixed-precision` for more details.
+
+.. member:: int Solver::Options::max_num_refinement_iterations
+
+   Default: ``0``
+
+   Number steps of the iterative refinement process to run when
+   computing the Gauss-Newton step, see
+   :member:`Solver::Options::use_mixed_precision_solves`.
+
 .. member:: int Solver::Options::min_linear_solver_iterations
 
    Default: ``0``
@@ -1469,6 +1802,34 @@
    makes sense when the linear solver is an iterative solver, e.g.,
    ``ITERATIVE_SCHUR`` or ``CGNR``.
 
+.. member:: int Solver::Options::max_num_spse_iterations
+
+   Default: `5`
+
+   Maximum number of iterations performed by
+   ``SCHUR_POWER_SERIES_EXPANSION``. Each iteration corresponds to one
+   more term in the power series expansion od the inverse of the Schur
+   complement.  This value controls the maximum number of iterations
+   whether it is used as a preconditioner or just to initialize the
+   solution for ``ITERATIVE_SCHUR``.
+
+.. member:: bool Solver:Options::use_spse_initialization
+
+   Default: ``false``
+
+   Use Schur power series expansion to initialize the solution for
+   ``ITERATIVE_SCHUR``. This option can be set ``true`` regardless of
+   what preconditioner is being used.
+
+.. member:: double Solver::Options::spse_tolerance
+
+   Default: `0.1`
+
+   When ``use_spse_initialization`` is ``true``, this parameter along
+   with ``max_num_spse_iterations`` controls the number of
+   ``SCHUR_POWER_SERIES_EXPANSION`` iterations performed for
+   initialization. It is not used to control the preconditioner.
+
 .. member:: double Solver::Options::eta
 
    Default: ``1e-1``
@@ -1502,6 +1863,25 @@
    objects that have an :class:`EvaluationCallback` associated with
    them.
 
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
+
+   Default: ``nullptr``
+
+   If :member:`Solver::Options::use_inner_iterations` true, then the
+   user has two choices.
+
+   1. Let the solver heuristically decide which parameter blocks to
+      optimize in each inner iteration. To do this, set
+      :member:`Solver::Options::inner_iteration_ordering` to ``nullptr``.
+
+   2. Specify a collection of of ordered independent sets. The lower
+      numbered groups are optimized before the higher number groups
+      during the inner optimization phase. Each group must be an
+      independent set. Not all parameter blocks need to be included in
+      the ordering.
+
+   See :ref:`section-ordering` for more details.
+
 .. member:: double Solver::Options::inner_iteration_tolerance
 
    Default: ``1e-3``
@@ -1516,37 +1896,21 @@
    inner iterations in subsequent trust region minimizer iterations is
    disabled.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
-
-   Default: ``NULL``
-
-   If :member:`Solver::Options::use_inner_iterations` true, then the
-   user has two choices.
-
-   1. Let the solver heuristically decide which parameter blocks to
-      optimize in each inner iteration. To do this, set
-      :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
-
-   2. Specify a collection of of ordered independent sets. The lower
-      numbered groups are optimized before the higher number groups
-      during the inner optimization phase. Each group must be an
-      independent set. Not all parameter blocks need to be included in
-      the ordering.
-
-   See :ref:`section-ordering` for more details.
 
 .. member:: LoggingType Solver::Options::logging_type
 
    Default: ``PER_MINIMIZER_ITERATION``
 
+   Valid values are ``SILENT`` and ``PER_MINIMIZER_ITERATION``.
+
 .. member:: bool Solver::Options::minimizer_progress_to_stdout
 
    Default: ``false``
 
-   By default the :class:`Minimizer` progress is logged to ``STDERR``
+   By default the Minimizer's progress is logged to ``STDERR``
    depending on the ``vlog`` level. If this flag is set to true, and
-   :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
-   output is sent to ``STDOUT``.
+   :member:`Solver::Options::logging_type` is not ``SILENT``, the
+   logging output is sent to ``STDOUT``.
 
    For ``TRUST_REGION_MINIMIZER`` the progress display looks like
 
@@ -1595,7 +1959,7 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
+.. member:: std::vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
 
    Default: ``empty``
 
@@ -1603,7 +1967,7 @@
    the trust region problem. Useful for testing and benchmarking. If
    ``empty``, no problems are dumped.
 
-.. member:: string Solver::Options::trust_region_problem_dump_directory
+.. member:: std::string Solver::Options::trust_region_problem_dump_directory
 
    Default: ``/tmp``
 
@@ -1614,7 +1978,7 @@
     :member:`Solver::Options::trust_region_problem_dump_format_type` is not
     ``CONSOLE``.
 
-.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
+.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format_type
 
    Default: ``TEXTFILE``
 
@@ -1708,23 +2072,32 @@
    should not expect to look at the parameter blocks and interpret
    their values.
 
-.. member:: vector<IterationCallback> Solver::Options::callbacks
+.. member:: std::vector<IterationCallback*> Solver::Options::callbacks
+
+   Default: ``empty``
 
    Callbacks that are executed at the end of each iteration of the
-   :class:`Minimizer`. They are executed in the order that they are
+   minimizer. They are executed in the order that they are
    specified in this vector.
 
    By default, parameter blocks are updated only at the end of the
-   optimization, i.e., when the :class:`Minimizer` terminates. This
-   means that by default, if an :class:`IterationCallback` inspects
-   the parameter blocks, they will not see them changing in the course
-   of the optimization.
+   optimization, i.e., when the minimizer terminates. This means that
+   by default, if an :class:`IterationCallback` inspects the parameter
+   blocks, they will not see them changing in the course of the
+   optimization.
 
    To tell Ceres to update the parameter blocks at the end of each
    iteration and before calling the user's callback, set
    :member:`Solver::Options::update_state_every_iteration` to
    ``true``.
 
+   See `examples/iteration_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+   for an example of an :class:`IterationCallback` that uses
+   :member:`Solver::Options::update_state_every_iteration` to log
+   changes to the parameter blocks over the course of the
+   optimization.
+
    The solver does NOT take ownership of these pointers.
 
 :class:`ParameterBlockOrdering`
@@ -1786,8 +2159,8 @@
 
    Number of groups with one or more elements.
 
-:class:`IterationCallback`
-==========================
+:class:`IterationSummary`
+=========================
 
 .. class:: IterationSummary
 
@@ -1863,7 +2236,7 @@
 
    Size of the trust region at the end of the current iteration. For
    the Levenberg-Marquardt algorithm, the regularization parameter is
-   1.0 / member::`IterationSummary::trust_region_radius`.
+   1.0 / :member:`IterationSummary::trust_region_radius`.
 
 .. member:: double IterationSummary::eta
 
@@ -1906,6 +2279,8 @@
 
    Time (in seconds) since the user called Solve().
 
+:class:`IterationCallback`
+==========================
 
 .. class:: IterationCallback
 
@@ -1939,6 +2314,10 @@
   #. ``SOLVER_CONTINUE`` indicates that the solver should continue
      optimizing.
 
+  The return values can be used to implement custom termination
+  criterion that supercede the iteration/time/tolerance based
+  termination implemented by Ceres.
+
   For example, the following :class:`IterationCallback` is used
   internally by Ceres to log the progress of the optimization.
 
@@ -1978,6 +2357,12 @@
     };
 
 
+  See `examples/evaluation_callback_example.cc
+  <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+  for another example that uses
+  :member:`Solver::Options::update_state_every_iteration` to log
+  changes to the parameter blocks over the course of the optimization.
+
 
 :class:`CRSMatrix`
 ==================
@@ -1995,13 +2380,13 @@
 
    Number of columns.
 
-.. member:: vector<int> CRSMatrix::rows
+.. member:: std::vector<int> CRSMatrix::rows
 
    :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
    sized array that points into the :member:`CRSMatrix::cols` and
    :member:`CRSMatrix::values` array.
 
-.. member:: vector<int> CRSMatrix::cols
+.. member:: std::vector<int> CRSMatrix::cols
 
    :member:`CRSMatrix::cols` contain as many entries as there are
    non-zeros in the matrix.
@@ -2009,7 +2394,7 @@
    For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
    are the indices of the non-zero columns of row ``i``.
 
-.. member:: vector<int> CRSMatrix::values
+.. member:: std::vector<double> CRSMatrix::values
 
    :member:`CRSMatrix::values` contain as many entries as there are
    non-zeros in the matrix.
@@ -2043,12 +2428,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string Solver::Summary::BriefReport() const
+.. function:: std::string Solver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string Solver::Summary::FullReport() const
+.. function:: std::string Solver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -2071,7 +2456,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string Solver::Summary::message
+.. member:: std::string Solver::Summary::message
 
    Reason why the solver terminated.
 
@@ -2091,14 +2476,14 @@
    were held fixed by the preprocessor because all the parameter
    blocks that they depend on were fixed.
 
-.. member:: vector<IterationSummary> Solver::Summary::iterations
+.. member:: std::vector<IterationSummary> Solver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
 .. member:: int Solver::Summary::num_successful_steps
 
    Number of minimizer iterations in which the step was
-   accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
+   accepted. Unless :member:`Solver::Options::use_nonmonotonic_steps`
    is `true` this is also the number of steps in which the objective
    function value/cost went down.
 
@@ -2112,13 +2497,13 @@
 
    Number of times inner iterations were performed.
 
- .. member:: int Solver::Summary::num_line_search_steps
+.. member:: int Solver::Summary::num_line_search_steps
 
-    Total number of iterations inside the line search algorithm across
-    all invocations. We call these iterations "steps" to distinguish
-    them from the outer iterations of the line search and trust region
-    minimizer algorithms which call the line search algorithm as a
-    subroutine.
+   Total number of iterations inside the line search algorithm across
+   all invocations. We call these iterations "steps" to distinguish
+   them from the outer iterations of the line search and trust region
+   minimizer algorithms which call the line search algorithm as a
+   subroutine.
 
 .. member:: double Solver::Summary::preprocessor_time_in_seconds
 
@@ -2126,7 +2511,7 @@
 
 .. member:: double Solver::Summary::minimizer_time_in_seconds
 
-   Time (in seconds) spent in the Minimizer.
+   Time (in seconds) spent in the minimizer.
 
 .. member:: double Solver::Summary::postprocessor_time_in_seconds
 
@@ -2180,7 +2565,7 @@
    Dimension of the tangent space of the problem (or the number of
    columns in the Jacobian for the problem). This is different from
    :member:`Solver::Summary::num_parameters` if a parameter block is
-   associated with a :class:`LocalParameterization`.
+   associated with a :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks
 
@@ -2206,7 +2591,7 @@
    number of columns in the Jacobian for the reduced problem). This is
    different from :member:`Solver::Summary::num_parameters_reduced` if
    a parameter block in the reduced problem is associated with a
-   :class:`LocalParameterization`.
+   :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks_reduced
 
@@ -2224,9 +2609,7 @@
 .. member:: int Solver::Summary::num_threads_used
 
    Number of threads actually used by the solver for Jacobian and
-   residual evaluation. This number is not equal to
-   :member:`Solver::Summary::num_threads_given` if none of `OpenMP`
-   or `CXX_THREADS` is available.
+   residual evaluation.
 
 .. member:: LinearSolverType Solver::Summary::linear_solver_type_given
 
@@ -2242,12 +2625,12 @@
    `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
    available.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_given
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_given
 
    Size of the elimination groups given by the user as hints to the
    linear solver.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_used
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_used
 
    Size of the parameter groups used by the solver when ordering the
    columns of the Jacobian.  This maybe different from
@@ -2285,12 +2668,12 @@
    actually performed. For example, in a problem with just one parameter
    block, inner iterations are not performed.
 
-.. member:: vector<int> inner_iteration_ordering_given
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_given
 
    Size of the parameter groups given by the user for performing inner
    iterations.
 
-.. member:: vector<int> inner_iteration_ordering_used
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_used
 
    Size of the parameter groups given used by the solver for
    performing inner iterations. This maybe different from
@@ -2315,7 +2698,7 @@
 
    Type of clustering algorithm used for visibility based
    preconditioning. Only meaningful when the
-   :member:`Solver::Summary::preconditioner_type` is
+   :member:`Solver::Summary::preconditioner_type_used` is
    ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
 
 .. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type
diff --git a/docs/source/nnls_tutorial.rst b/docs/source/nnls_tutorial.rst
index 6c89032..6de800e 100644
--- a/docs/source/nnls_tutorial.rst
+++ b/docs/source/nnls_tutorial.rst
@@ -2,6 +2,8 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-nnls_tutorial:
 
 ========================
@@ -110,7 +112,7 @@
      // Set up the only cost function (also known as residual). This uses
      // auto-differentiation to obtain the derivative (jacobian).
      CostFunction* cost_function =
-         new AutoDiffCostFunction<CostFunctor, 1, 1>(new CostFunctor);
+         new AutoDiffCostFunction<CostFunctor, 1, 1>();
      problem.AddResidualBlock(cost_function, nullptr, &x);
 
      // Run the solver!
@@ -210,8 +212,7 @@
 .. code-block:: c++
 
   CostFunction* cost_function =
-    new NumericDiffCostFunction<NumericDiffCostFunctor, ceres::CENTRAL, 1, 1>(
-        new NumericDiffCostFunctor);
+    new NumericDiffCostFunction<NumericDiffCostFunctor, ceres::CENTRAL, 1, 1>();
   problem.AddResidualBlock(cost_function, nullptr, &x);
 
 Notice the parallel from when we were using automatic differentiation
@@ -219,7 +220,7 @@
 .. code-block:: c++
 
   CostFunction* cost_function =
-      new AutoDiffCostFunction<CostFunctor, 1, 1>(new CostFunctor);
+      new AutoDiffCostFunction<CostFunctor, 1, 1>();
   problem.AddResidualBlock(cost_function, nullptr, &x);
 
 The construction looks almost identical to the one used for automatic
@@ -355,16 +356,16 @@
 
   Problem problem;
 
-  // Add residual terms to the problem using the using the autodiff
+  // Add residual terms to the problem using the autodiff
   // wrapper to get the derivatives automatically.
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F1, 1, 1, 1>(new F1), nullptr, &x1, &x2);
+    new AutoDiffCostFunction<F1, 1, 1, 1>(), nullptr, &x1, &x2);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F2, 1, 1, 1>(new F2), nullptr, &x3, &x4);
+    new AutoDiffCostFunction<F2, 1, 1, 1>(), nullptr, &x3, &x4);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F3, 1, 1, 1>(new F3), nullptr, &x2, &x3)
+    new AutoDiffCostFunction<F3, 1, 1, 1>(), nullptr, &x2, &x3);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F4, 1, 1, 1>(new F4), nullptr, &x1, &x4);
+    new AutoDiffCostFunction<F4, 1, 1, 1>(), nullptr, &x1, &x4);
 
 
 Note that each ``ResidualBlock`` only depends on the two parameters
@@ -377,62 +378,65 @@
 
     Initial x1 = 3, x2 = -1, x3 = 0, x4 = 1
     iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
-       0  1.075000e+02    0.00e+00    1.55e+02   0.00e+00   0.00e+00  1.00e+04       0    4.95e-04    2.30e-03
-       1  5.036190e+00    1.02e+02    2.00e+01   2.16e+00   9.53e-01  3.00e+04       1    4.39e-05    2.40e-03
-       2  3.148168e-01    4.72e+00    2.50e+00   6.23e-01   9.37e-01  9.00e+04       1    9.06e-06    2.43e-03
-       3  1.967760e-02    2.95e-01    3.13e-01   3.08e-01   9.37e-01  2.70e+05       1    8.11e-06    2.45e-03
-       4  1.229900e-03    1.84e-02    3.91e-02   1.54e-01   9.37e-01  8.10e+05       1    6.91e-06    2.48e-03
-       5  7.687123e-05    1.15e-03    4.89e-03   7.69e-02   9.37e-01  2.43e+06       1    7.87e-06    2.50e-03
-       6  4.804625e-06    7.21e-05    6.11e-04   3.85e-02   9.37e-01  7.29e+06       1    5.96e-06    2.52e-03
-       7  3.003028e-07    4.50e-06    7.64e-05   1.92e-02   9.37e-01  2.19e+07       1    5.96e-06    2.55e-03
-       8  1.877006e-08    2.82e-07    9.54e-06   9.62e-03   9.37e-01  6.56e+07       1    5.96e-06    2.57e-03
-       9  1.173223e-09    1.76e-08    1.19e-06   4.81e-03   9.37e-01  1.97e+08       1    7.87e-06    2.60e-03
-      10  7.333425e-11    1.10e-09    1.49e-07   2.40e-03   9.37e-01  5.90e+08       1    6.20e-06    2.63e-03
-      11  4.584044e-12    6.88e-11    1.86e-08   1.20e-03   9.37e-01  1.77e+09       1    6.91e-06    2.65e-03
-      12  2.865573e-13    4.30e-12    2.33e-09   6.02e-04   9.37e-01  5.31e+09       1    5.96e-06    2.67e-03
-      13  1.791438e-14    2.69e-13    2.91e-10   3.01e-04   9.37e-01  1.59e+10       1    7.15e-06    2.69e-03
+       0  1.075000e+02    0.00e+00    1.55e+02   0.00e+00   0.00e+00  1.00e+04        0    2.91e-05    3.40e-04
+       1  5.036190e+00    1.02e+02    2.00e+01   0.00e+00   9.53e-01  3.00e+04        1    4.98e-05    3.99e-04
+       2  3.148168e-01    4.72e+00    2.50e+00   6.23e-01   9.37e-01  9.00e+04        1    2.15e-06    4.06e-04
+       3  1.967760e-02    2.95e-01    3.13e-01   3.08e-01   9.37e-01  2.70e+05        1    9.54e-07    4.10e-04
+       4  1.229900e-03    1.84e-02    3.91e-02   1.54e-01   9.37e-01  8.10e+05        1    1.91e-06    4.14e-04
+       5  7.687123e-05    1.15e-03    4.89e-03   7.69e-02   9.37e-01  2.43e+06        1    1.91e-06    4.18e-04
+       6  4.804625e-06    7.21e-05    6.11e-04   3.85e-02   9.37e-01  7.29e+06        1    1.19e-06    4.21e-04
+       7  3.003028e-07    4.50e-06    7.64e-05   1.92e-02   9.37e-01  2.19e+07        1    1.91e-06    4.25e-04
+       8  1.877006e-08    2.82e-07    9.54e-06   9.62e-03   9.37e-01  6.56e+07        1    9.54e-07    4.28e-04
+       9  1.173223e-09    1.76e-08    1.19e-06   4.81e-03   9.37e-01  1.97e+08        1    9.54e-07    4.32e-04
+      10  7.333425e-11    1.10e-09    1.49e-07   2.40e-03   9.37e-01  5.90e+08        1    9.54e-07    4.35e-04
+      11  4.584044e-12    6.88e-11    1.86e-08   1.20e-03   9.37e-01  1.77e+09        1    9.54e-07    4.38e-04
+      12  2.865573e-13    4.30e-12    2.33e-09   6.02e-04   9.37e-01  5.31e+09        1    2.15e-06    4.42e-04
+      13  1.791438e-14    2.69e-13    2.91e-10   3.01e-04   9.37e-01  1.59e+10        1    1.91e-06    4.45e-04
+      14  1.120029e-15    1.68e-14    3.64e-11   1.51e-04   9.37e-01  4.78e+10        1    2.15e-06    4.48e-04
 
-    Ceres Solver v1.12.0 Solve Report
-    ----------------------------------
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
+
                                          Original                  Reduced
     Parameter blocks                            4                        4
     Parameters                                  4                        4
     Residual blocks                             4                        4
-    Residual                                    4                        4
+    Residuals                                   4                        4
 
     Minimizer                        TRUST_REGION
 
     Dense linear algebra library            EIGEN
     Trust region strategy     LEVENBERG_MARQUARDT
-
                                             Given                     Used
     Linear solver                        DENSE_QR                 DENSE_QR
     Threads                                     1                        1
-    Linear solver threads                       1                        1
+    Linear solver ordering              AUTOMATIC                        4
 
     Cost:
     Initial                          1.075000e+02
-    Final                            1.791438e-14
+    Final                            1.120029e-15
     Change                           1.075000e+02
 
-    Minimizer iterations                       14
-    Successful steps                           14
+    Minimizer iterations                       15
+    Successful steps                           15
     Unsuccessful steps                          0
 
     Time (in seconds):
-    Preprocessor                            0.002
+    Preprocessor                         0.000311
 
-      Residual evaluation                   0.000
-      Jacobian evaluation                   0.000
-      Linear solver                         0.000
-    Minimizer                               0.001
+      Residual only evaluation           0.000002 (14)
+      Jacobian & residual evaluation     0.000023 (15)
+      Linear solver                      0.000043 (14)
+    Minimizer                            0.000163
 
-    Postprocessor                           0.000
-    Total                                   0.005
+    Postprocessor                        0.000012
+    Total                                0.000486
 
     Termination:                      CONVERGENCE (Gradient tolerance reached. Gradient max norm: 3.642190e-11 <= 1.000000e-10)
 
-    Final x1 = 0.000292189, x2 = -2.92189e-05, x3 = 4.79511e-05, x4 = 4.79511e-05
+    Final x1 = 0.000146222, x2 = -1.46222e-05, x3 = 2.40957e-05, x4 = 2.40957e-05
+
+
+
 
 It is easy to see that the optimal solution to this problem is at
 :math:`x_1=0, x_2=0, x_3=0, x_4=0` with an objective function value of
@@ -494,8 +498,8 @@
  Problem problem;
  for (int i = 0; i < kNumObservations; ++i) {
    CostFunction* cost_function =
-        new AutoDiffCostFunction<ExponentialResidual, 1, 1, 1>(
-            new ExponentialResidual(data[2 * i], data[2 * i + 1]));
+        new AutoDiffCostFunction<ExponentialResidual, 1, 1, 1>
+            (data[2 * i], data[2 * i + 1]);
    problem.AddResidualBlock(cost_function, nullptr, &m, &c);
  }
 
@@ -670,8 +674,8 @@
     // the client code.
     static ceres::CostFunction* Create(const double observed_x,
                                        const double observed_y) {
-      return (new ceres::AutoDiffCostFunction<SnavelyReprojectionError, 2, 9, 3>(
-                  new SnavelyReprojectionError(observed_x, observed_y)));
+      return new ceres::AutoDiffCostFunction<SnavelyReprojectionError, 2, 9, 3>
+        (observed_x, observed_y);
     }
 
    double observed_x;
@@ -728,7 +732,7 @@
 
 For a more sophisticated bundle adjustment example which demonstrates
 the use of Ceres' more advanced features including its various linear
-solvers, robust loss functions and local parameterizations see
+solvers, robust loss functions and manifolds see
 `examples/bundle_adjuster.cc
 <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/bundle_adjuster.cc>`_
 
@@ -862,23 +866,23 @@
    measurement and the predicted measurement is:
 
    .. math:: r_{ab} =
-	     \left[
-	     \begin{array}{c}
-	       R_a^T\left(p_b - p_a\right) - \hat{p}_{ab} \\
-	       \mathrm{Normalize}\left(\psi_b - \psi_a - \hat{\psi}_{ab}\right)
-	     \end{array}
-	     \right]
+             \left[
+             \begin{array}{c}
+               R_a^T\left(p_b - p_a\right) - \hat{p}_{ab} \\
+               \mathrm{Normalize}\left(\psi_b - \psi_a - \hat{\psi}_{ab}\right)
+             \end{array}
+             \right]
 
    where the function :math:`\mathrm{Normalize}()` normalizes the angle in the range
    :math:`[-\pi,\pi)`, and :math:`R` is the rotation matrix given by
 
    .. math:: R_a =
-	     \left[
-	     \begin{array}{cc}
-	       \cos \psi_a & -\sin \psi_a \\
-	       \sin \psi_a & \cos \psi_a \\
-	     \end{array}
-	     \right]
+             \left[
+             \begin{array}{cc}
+               \cos \psi_a & -\sin \psi_a \\
+               \sin \psi_a & \cos \psi_a \\
+             \end{array}
+             \right]
 
    To finish the cost function, we need to weight the residual by the
    uncertainty of the measurement. Hence, we pre-multiply the residual by the
@@ -886,10 +890,12 @@
    i.e. :math:`\Sigma_{ab}^{-\frac{1}{2}} r_{ab}` where :math:`\Sigma_{ab}` is
    the covariance.
 
-   Lastly, we use a local parameterization to normalize the orientation in the
-   range which is normalized between :math:`[-\pi,\pi)`.  Specially, we define
-   the :member:`AngleLocalParameterization::operator()` function to be:
-   :math:`\mathrm{Normalize}(\psi + \delta \psi)`.
+   Lastly, we use a manifold to normalize the orientation in the range
+   :math:`[-\pi,\pi)`.  Specially, we define the
+   :member:`AngleManifold::Plus()` function to be:
+   :math:`\mathrm{Normalize}(\psi + \Delta)` and
+   :member:`AngleManifold::Minus()` function to be
+   :math:`\mathrm{Normalize}(y) - \mathrm{Normalize}(x)`.
 
    This package includes an executable :member:`pose_graph_2d` that will read a
    problem definition file. This executable can work with any 2D problem
@@ -980,17 +986,18 @@
    i.e. :math:`\Sigma_{ab}^{-\frac{1}{2}} r_{ab}` where :math:`\Sigma_{ab}` is
    the covariance.
 
-   Given that we are using a quaternion to represent the orientation, we need to
-   use a local parameterization (:class:`EigenQuaternionParameterization`) to
+   Given that we are using a quaternion to represent the orientation,
+   we need to use a manifold (:class:`EigenQuaternionManifold`) to
    only apply updates orthogonal to the 4-vector defining the
-   quaternion. Eigen's quaternion uses a different internal memory layout for
-   the elements of the quaternion than what is commonly used. Specifically,
-   Eigen stores the elements in memory as :math:`[x, y, z, w]` where the real
-   part is last whereas it is typically stored first. Note, when creating an
-   Eigen quaternion through the constructor the elements are accepted in
-   :math:`w`, :math:`x`, :math:`y`, :math:`z` order. Since Ceres operates on
-   parameter blocks which are raw double pointers this difference is important
-   and requires a different parameterization.
+   quaternion. Eigen's quaternion uses a different internal memory
+   layout for the elements of the quaternion than what is commonly
+   used. Specifically, Eigen stores the elements in memory as
+   :math:`[x, y, z, w]` where the real part is last whereas it is
+   typically stored first. Note, when creating an Eigen quaternion
+   through the constructor the elements are accepted in :math:`w`,
+   :math:`x`, :math:`y`, :math:`z` order. Since Ceres operates on
+   parameter blocks which are raw double pointers this difference is
+   important and requires a different parameterization.
 
    This package includes an executable :member:`pose_graph_3d` that will read a
    problem definition file. This executable can work with any 3D problem
diff --git a/docs/source/numerical_derivatives.rst b/docs/source/numerical_derivatives.rst
index 57b46bf..8d7fb3a 100644
--- a/docs/source/numerical_derivatives.rst
+++ b/docs/source/numerical_derivatives.rst
@@ -61,8 +61,7 @@
   }
 
   CostFunction* cost_function =
-    new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
-      new Rat43CostFunctor(x, y));
+    new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(x, y);
 
 This is about the minimum amount of work one can expect to do to
 define the cost function. The only thing that the user needs to do is
@@ -326,7 +325,7 @@
 Compared to the *correct* value :math:`Df(1.0) = 140.73773557129658`,
 :math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
 comparison, the relative error for the central difference formula with
-the same stepsize (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
+the same step size (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
 
 The above tableau is the basis of Ridders' method for numeric
 differentiation. The full implementation is an adaptive scheme that
diff --git a/docs/source/solving_faqs.rst b/docs/source/solving_faqs.rst
index 64604c4..3842e4d 100644
--- a/docs/source/solving_faqs.rst
+++ b/docs/source/solving_faqs.rst
@@ -23,16 +23,13 @@
 
    2. For general sparse problems (i.e., the Jacobian matrix has a
       substantial number of zeros) use
-      ``SPARSE_NORMAL_CHOLESKY``. This requires that you have
-      ``SuiteSparse`` or ``CXSparse`` installed.
+      ``SPARSE_NORMAL_CHOLESKY``.
 
    3. For bundle adjustment problems with up to a hundred or so
       cameras, use ``DENSE_SCHUR``.
 
    4. For larger bundle adjustment problems with sparse Schur
-      Complement/Reduced camera matrices use ``SPARSE_SCHUR``. This
-      requires that you build Ceres with support for ``SuiteSparse``,
-      ``CXSparse`` or Eigen's sparse linear algebra libraries.
+      Complement/Reduced camera matrices use ``SPARSE_SCHUR``.
 
       If you do not have access to these libraries for whatever
       reason, ``ITERATIVE_SCHUR`` with ``SCHUR_JACOBI`` is an
diff --git a/docs/source/version_history.rst b/docs/source/version_history.rst
index 72ae832..50f8353 100644
--- a/docs/source/version_history.rst
+++ b/docs/source/version_history.rst
@@ -1,9 +1,320 @@
+.. default-domain:: cpp
+
+.. highlight:: c++
+
+.. cpp:namespace:: ceres
+
+
 .. _chapter-version-history:
 
 ===============
 Version History
 ===============
 
+2.2.0
+=====
+
+New Features
+------------
+
+#. Substantial improvement to threading performance across the board
+   (Dmitry Korchemkin)
+#. Mixed precision solves + iterative refinement when using ``CUDA`` or
+   CPU based dense linear solvers, or ``EIGEN_SPARSE`` as the sparse
+   linear algebra library. (Sameer Agarwal & Joydeep Biswas)
+#. Cuda based CGNR and preconditioner support (Joydeep Biswas & Sameer
+   Agarwal)
+#. Nested Dissection (``NESDIS``) is now supported as an ordering method
+   in addition to ``AMD``. (Sameer Agarwal, Alex Stewart & Sergiu
+   Deitsch)
+#. **Power Bundle Adjustment** is available as a linear solver and as
+   a preconditioner by the name of ``SCHUR POWER SERIES EXPANSION``
+   (Mark Shachkov).
+#. Generalized Euler Angle conversions (hs293go@)
+
+
+Backward Incompatible API Changes
+---------------------------------
+
+#. :class:`LocalParameterization` has been removed, use
+   :class:`Manifold` instead.
+#. Ceres Solver now requires a C++17 compliant compiler.
+#. Ceres Solver now requires CMake version 3.16 or later.
+#. Ceres Solver now requires SuiteSparse version 4.5.6 or later.
+#. OpenMP and NO_THREADING backends have been removed. C++ threads is
+   how all threading is done.
+#. Support for ``CX_SPARSE`` as a sparse linear algebra backend has
+   been removed. Similar or better performance can be expected from
+   ``Eigen`` as the sparse linear algebra library.
+
+
+Bug Fixes & Minor Changes
+-------------------------
+#. Optimize the computation of the LM diagonal in TinySolver
+#. Improvements to multi-threaded performance for small problems that
+   had regressed due to changes to threading (Dmitrity Korchemkin)
+#. Fix handling of M_PI for MSVC (Sergiu Deitsch)
+#. Add a default value for Solver::Summary::linear_solver_ordering_type (Sameer Agarwal)
+#. Make sure that the code compiles well with CUDA 11 (Dmitriy
+   Korchemkin)
+#. Rework MSVC warning suppression (Sergiu Deitsch)
+#. Add an example for EvaluationCallback (Sameer Agarwal)
+#. Add an example for IterationCallback (Sameer Agarwal)
+#. Add end-to-end BA tests for SCHUR_POWER_SERIES_EXPANSION (Sameer Agarwal)
+#. Update documentation for linear solvers (Sameer Agarwal)
+#. Add an accessor for the CostFunctor in DynamicAutoDiffCostFunction (Sameer Agarwal)
+#. Runtime check for cudaMallocAsync support (Dmitriy Korchemkin)
+#. Remove cuda-memcheck based tests (Sameer Agarwal)
+#. Modernize ``Sphinx`` related CMake handling as well the ``Sphinx``
+   build process in the terminal. (Sergiu Deitsch)
+#. Fix macos ``sprintf`` security related warnings (Sergiu Deitsch)
+#. Lots of Cuda releated build system fixes (Sergiu Deitsch, Dmitriy
+   Korchemkin, Jason Mak)
+#. Improved windows build support (Sergiu Deitsch)
+#. Various documentation fixes (Maxim Smolskiy, Evan Levine)
+#. Improved handling of large Jacobians (Sameer Agarwal)
+#. Improved handling of infinite initial cost (Sameer Agarwal)
+#. Improved traits support for Jets (Sameer Agarwal)
+#. Improved tests for Euler angle conversion routines (@Hs293Go)
+#. Use a std::tuple to store ProductManifold for better efficiency
+   (Sergiu Deitsch)
+#. Allow default construction of ProductManifold when underlying
+   manifolds have default constructors (Sergiu Deitsch)
+#. Move LineManifold and SphereManifold into their own headers (Sameer
+   Agarwal)
+#. Fix a byte vs number of elements error when dealing with CUDA
+   workspace computations (Joydeep Biswas)
+#. Hide and prevent internal symbols from being exported (Sergiu
+   Deitsch)
+#. Switch to imported SuiteSparse, CXSparse & METIS targets.
+#. Improve compilation on Ubuntu 20.04 (Sergiu Deitsch)
+#. Update to using gtest 1.11.0 (Sameer Agarwal)
+#. Fix Euler angle conversion code to not rely on constexpr
+   constrctors for Jets. (Sameer Agarwal)
+#. BlockRandomAccessSparseMatrix now uses a BlockSparseMatrix as
+   storage instead of TripletSparseMatrix. (Dmitriy Korchemkin)
+#. Deduction guide for DynamicAutoDiffCostFunction (Sergiu Deitsch)
+#. Explicit conversions from long to ints (Alexander Ivanov)
+#. Unused code deletion/commenting and code modernization (Alexander
+   Ivanov)
+#. Improve the bazel build & tests (Alexander Ivanov)
+#. Fix a bug in QuaternionRotatePoint introduced by the use of hypot
+   earlier in this release cycle (Jonathan Taylor & Sameer Agarwal)
+#. Lots of GitHub CI improvements (Sergiu Deitsch & Dmitry Korchemkin)
+#. Improve the robustness of the Cuda based dense linear algebra tests
+   (Joydeep Biswas)
+#. Refactor storage & threading support in BlockRandomAccessMatrix and
+   its subclasses (Sameer Agarwal)
+#. Fix a bug in CoordinateDescentMinimizer related to uninitialized
+   variables (Sameer Agarwal)
+#. Remove OpenMP and NO_THREADS backends. (Sameer Agarwal)
+#. Fix version string parsing starting with SuiteSparse 6.0 (Sergiu
+   Deitsch)
+#. Use FindCUDAToolkit for CMake >= 3.17 (Alex Stewart)
+#. Add a const accessor for the Problem::Options struct used by
+   Problem. (Alex Stewart)
+#. Fix a serious performance regression when using SuiteSparse
+   introduced in `d09f7e9d5e
+   <https://github.com/ceres-solver/ceres-solver/commit/d09f7e9d5e3bfab2d7ec7e81fd6a55786edca17a>`_. (Sameer
+   Agarwal)
+#. Fix the build on QNX (Alex Stewart)
+#. Improve testing macros and documentation for Manifolds (Alex
+   Stewart)
+#. Improved code formatting (Tyler Hovanec)
+#. Better use of std::unique_ptr in the code (Mike Vitus)
+#. Fix a memory leak in ContextImpl (Sameer Agarwal)
+#. Faster locking when num_thread = 1 (Sameer Agarwal)
+#. Fix how x_norm is computed in TrustRegionMinimizer (Sameer Agarwal)
+#. Faster JACOBI preconditioner for CGNR (Sameer Agarwal)
+#. Convert internal enums to class enums (Sameer Agarwal)
+#. Improve the code in small_blas to be more compiler friendly (Sameer
+   Agarwal)
+#. Add the ability to specify the pivot threshold in
+   ::class::`Covariance::Options` (Sameer Agarwal)
+#. Modernize the internals to use C++17 (Sameer Agarwal)
+#. Choose SPMV algorithm based on the CUDA SDK Version (Joydeep
+   Biswas)
+#. Better defaults in ``bundle_adjuster.cc`` (Sameer Agarwal)
+#. Use ``foo.data()`` instead of ``&foo[0]`` (Sameer Agarwal)
+#. Fix GCC 12.1.1 LTO -Walloc-size-larger-than= warnings (Sergiu
+   Deitsch)
+#. Improved determinism in tests by re-using the same PRNG (Sergiu
+   Deitsch)
+#. Improved docs for ``vcpkg`` installation. (Sergiu Deitsch)
+#. Update FindGlog.cmake to create glog::glog target (KrisThielemans@)
+#. Improve consistency & correctness of Sphere & Line Manifolds
+   (Julio L. Paneque)
+#. Remove ``ceres/internal/random.h`` in favor of ``<random>``.
+#. Fix a crash in ``InnerProductComputer`` (Sameer Agarwal)
+#. Various fixes to improve compilation on windows using MinGW & MSVC
+   (Sergiu Deitsch)
+#. Fix fmin/fmax() to use Jet averaging on equality (Alex Stewart)
+#. Fix use of conditional preprocessor checks within a macro in tests
+   (Alex Stewart)
+#. Better support for ``CUDA memcheck`` (Joydeep Biswas)
+#. Improve the logic for linking to the platform specific threading
+   library (Sergiu Deitsch)
+#. Generate the version string at compile time (Sergiu Deitsch)
+#. :class:`NumericDiffFirstOrderFunction` can now take a dynamically
+   sized parameter vector. (Sameer Agarwal)
+#. Fix compilation with SuiteSparse 7.2.0 (Mark Shackov)
+
+2.1.0
+=====
+
+New Features
+------------
+
+#. Support for CUDA based dense solvers - ``DENSE_QR``,
+   ``DENSE_NORMAL_CHOLESKY`` & ``DENSE_SCHUR`` (Joydeep Biswas, Sameer
+   Agarwal)
+
+#. :class:`Manifold` is the new
+   :class:`LocalParameterization`. Version 2.1 is the transition
+   release where users can use both :class:`LocalParameterization` as
+   well as :class:`Manifold` objects as they transition from the
+   former to the latter. :class:`LocalParameterization` will be
+   removed in version 2.2. There should be no numerical change to the
+   results as a result of this change. (Sameer Agarwal, Johannes Beck,
+   Sergiu Deitsch)
+
+#. A number of changes to :class:`Jet` s (Sergiu Deitsch)
+
+   * :class:`Jet` gained support for, ``copysign``, ``fma`` (fused
+     multiply-add), ``midpoint`` (C++20 and above), ``lerp`` (C++20
+     and above), 3-argument ``hypot`` (C++17 and above), ``log10``,
+     ``log1p``, ``exp1m``, ``norm`` (squared :math:`L^2` norm).
+
+   * Quiet floating-point comparison: ``isless``, ``isgreater``,
+     ``islessgreater``, ``islessequal``, ``isgreaterequal``,
+     ``isunordered``, ``signbit``, ``fdim``
+
+   * Categorization and comparison operations are applied exclusively
+     and consistently to the scalar part of a Jet now: ``isnan``,
+     ``isinf``, ``isnormal``, ``isfinite``, ``fpclassify`` (new),
+     ``fmin``, ``fmax``
+
+   * It is now possible to safely compare a :class:`Jet` against a scalar
+     (or literal) without constructing a :class:`Jet` first (even if it's
+     nested):
+
+     .. code-block:: c++
+
+        Jet<Jet<Jet<T, N>, M>, O> x;
+        if (x == 2) { } // equivalent to x.a.a.a == 2
+
+
+     This enables interaction with various arithmetic functions that
+     expect a scalar like instance, such as ``boost::math::pow<-N>``
+     for reciprocal computation.
+
+#. Add :class:`NumericDiffFirstOrderFunction` (Sameer Agarwal)
+
+
+Backward Incompatible API Changes
+---------------------------------
+
+#. :class:`LocalParameterization` is deprecated. It will be removed in
+   version 2.2. Use :class:`Manifold` instead.
+#. Classification functions like ``IsFinite`` are deprecated. Use the
+   ``C++11`` functions (``isfinite``, ``isnan`` etc) going
+   forward. However to maintain consistent behaviour with comparison
+   operators, these functions only inspect the scalar part of the
+   :class:`Jet`.
+
+Bug Fixes & Minor Changes
+-------------------------
+
+#. Worked around an MSVC ordering bug when using C++17/20 (Sergiu
+   Deitsch)
+#. Added a CITATION.cff file. (Sergiu Deitsch)
+#. Updated included gtest version to 1.11.0. This should fix some
+   ``C++20`` compilation problems. (Sameer Agarwal).
+#. Workaround ``MSVC`` ``STL`` deficiency in ``C++17`` mode (Sergiu
+   Deitsch)
+#. Fix ``Jet`` test failures on ``ARMv8`` with recent ``Xcode``
+   (Sergiu Deitsch)
+#. Fix unused arguments of ``Make1stOrderPerturbation`` (Dmitriy
+   Korchemkin)
+#. Fix ``SuiteSparse`` path and version reporting (Sergiu Deitsch)
+#. Enable `GitHub` workflows and deprecate ``TravisCI`` (Sergiu
+   Deitsch)
+#. Add missing includes (Sergiu Deitsch, Sameer Agarwal)
+#. Fix path for ``cuda-memcheck`` tests (Joydeep Biswas)
+#. ClangFormat cleanup (Sameer Agarwal)
+#. Set ``CMP0057`` policy for ``IN_LIST`` operator in
+   ``FindSuiteSparse.cmake`` (Brent Yi)
+#. Do not define unusable import targets (Sergiu Deitsch)
+#. Fix Ubuntu 18.04 shared library build (Sergiu Deitsch)
+#. Force ``C++`` linker when building the ``C`` API (Sergiu Deitsch)
+#. Modernize the code to be inline with ``C++14`` (Sergiu Deitsch,
+   Sameer Agarwal)
+#. Lots of fixes to make Ceres compile out of the box on Windows
+   (Sergiu Deitsch)
+#. Standardize path handling using ``GNUImstallDirs`` (Sergiu Deitsch)
+#. Add final specifier to classes to help the compiler with
+   devirtualization (Sameer Agarwal)
+#. LOTs of clean & modernization of the CMake build files (Sergiu
+   Deitsch & Alex Stewart)
+#. Simplification to the symbol export logic (Sergiu Deitsch)
+#. Add cmake option ``ENABLE_BITCODE`` for iOS builds (John Harrison)
+#. Add const accessor for functor wrapped by auto/numeric-diff objects
+   (Alex Stewart)
+#. Cleanup & refactor ``jet_test.cc``. (Sameer Agarwal)
+#. Fix docs of supported sparse backends for mixed precision solvers
+   (Alex Stewart)
+#. Fix C++20 compilation (Sergiu Deitsch)
+#. Add an example for ``BiCubicInterpolator`` (Dmitriy Korcchemkin)
+#. Add a section to the documentation on implicit and inverse function
+   theorems (Sameer Agarwal)
+#. Add a note about Trigg's correction (Sameer Agarwal)
+#. Fix the docs for ``Problem::RemoveResidualBlock`` &
+   ``Problem::RemoveParameterBlock`` (Sameer Agarwal)
+#. Fix an incorrect check in ``reorder_program.cc`` (William Gandler)
+#. Add ``function_tolerance`` based convergence testing to ``TinySolver``
+   (Sameer Agarwal).
+#. Fix a number of typos in ``rotation.h`` (@yiping)
+#. Fix a typo in ``interfacing_with_autodiff.rst`` (@tangobravo)
+#. Fix a matrix sizing bug in covariance_impl.cc (William Gandler)
+#. Fix a bug in ``system_test.cc`` (William Gandler)
+#. Fix the Jacobian computation in ``trust_region_minimizer_test.cc``
+   (William Gandler)
+#. Fix a bug in ``local_parameterization_test.cc`` (William Gandler)
+#. Add accessors to ``GradientProblem`` (Sameer Agarwal)
+#. Refactor ``small_blas_gemm_benchmark`` (Ahmed Taei)
+#. Refactor ``small_blas_test`` (Ahmed Taei)
+#. Fix dependency check for building documentation (Sumit Dey)
+#. Fix an errant double link in the docs (Timon Knigge)
+#. Fix a typo in the version history (Noah Snavely)
+#. Fix typo in LossFunctionWrapper sample code (Dmitriy Korchemkin)
+#. Add fmax/fmin overloads for scalars (Alex Karatarakis)
+#. Introduce benchmarks for ``Jet`` operations (Alexander Karatarakis)
+#. Fix typos in documentation and fix the documentation for
+   ``IterationSummary`` (Alexander Karatarakis)
+#. Do not check MaxNumThreadsAvailable if the thread number is set
+   to 1. (Fuhao Shi)
+#. Add a macro ``CERES_GET_FLAG``. (Sameer Agarwal)
+#. Reduce log spam in ``covariance_impl.cc`` (Daniel Henell)
+#. Fix FindTBB version detection with TBB >= 2021.1.1 (Alex Stewart)
+#. Fix Eigen3_VERSION (Florian Berchtold)
+#. Allow Unity Build (Tobias Schluter)
+#. Make miniglog's InitGoogleLogging argument const (Tobias Schluter)
+#. Use portable expression for constant 2/sqrt(pi) (Tobias Schluter)
+#. Fix a number of compile errors related (Austin Schuch)
+
+   * ``format not a string literal``
+   * ``-Wno-maybe-uninitialized error``
+   * ``nonnull arg compared to NULL``
+   * ``-Wno-format-nonliteral``
+   * ``-Wmissing-field-initializers``
+   * ``-Werror``
+
+#. Fix ``cc_binary`` includes so examples build as an external repo
+   (Austin Schuh)
+#. Fix an explicit double in TinySolver (Bogdan Burlacu)
+#. Fix unit quaternion rotation (Mykyta Kozlov)
+
+
 2.0.0
 =====
 
@@ -1532,8 +1843,8 @@
 =======
 
 Ceres Solver grew out of the need for general least squares solving at
-Google. In early 2010, Sameer Agarwal and Fredrik Schaffalitzky
-started the development of Ceres Solver. Fredrik left Google shortly
+Google. In early 2010, Sameer Agarwal and Frederik Schaffalitzky
+started the development of Ceres Solver. Frederik left Google shortly
 thereafter and Keir Mierle stepped in to take his place. After two
 years of on-and-off development, Ceres Solver was released as open
 source in May of 2012.