diff --git a/docs/source/CMakeLists.txt b/docs/source/CMakeLists.txt
index 70bf998..1f8fed8 100644
--- a/docs/source/CMakeLists.txt
+++ b/docs/source/CMakeLists.txt
@@ -1,19 +1,21 @@
-find_package(Sphinx REQUIRED)
-
 # HTML output directory
 set(SPHINX_HTML_DIR "${Ceres_BINARY_DIR}/docs/html")
 
 # Install documentation
 install(DIRECTORY ${SPHINX_HTML_DIR}
-        DESTINATION "${CERES_DOCS_INSTALL_DIR}"
+        DESTINATION ${CMAKE_INSTALL_DOCDIR}
         COMPONENT Doc
         PATTERN "${SPHINX_HTML_DIR}/*")
 
+# Find python
+find_package(Python REQUIRED COMPONENTS Interpreter)
+
 # Building using 'make_docs.py' python script
 add_custom_target(ceres_docs ALL
-                  python
+                  $<TARGET_FILE:Python::Interpreter>
                   "${Ceres_SOURCE_DIR}/scripts/make_docs.py"
                   "${Ceres_SOURCE_DIR}"
                   "${Ceres_BINARY_DIR}/docs"
-                  "${SPHINX_EXECUTABLE}"
+                  "${Sphinx_BUILD_EXECUTABLE}"
+                  USES_TERMINAL
                   COMMENT "Building HTML documentation with Sphinx")
diff --git a/docs/source/automatic_derivatives.rst b/docs/source/automatic_derivatives.rst
index e15e911..3fe0727 100644
--- a/docs/source/automatic_derivatives.rst
+++ b/docs/source/automatic_derivatives.rst
@@ -37,9 +37,7 @@
   };
 
 
-  CostFunction* cost_function =
-        new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
-          new Rat43CostFunctor(x, y));
+  auto* cost_function = new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(x, y);
 
 Notice that compared to numeric differentiation, the only difference
 when defining the functor for use with automatic differentiation is
@@ -298,7 +296,7 @@
 
 There is no single solution to this problem. In some cases one needs
 to reason explicitly about the points where indeterminacy may occur
-and use alternate expressions using `L'Hopital's rule
+and use alternate expressions using `L'Hôpital's rule
 <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
 example some of the conversion routines in `rotation.h
 <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
diff --git a/docs/source/bibliography.rst b/docs/source/bibliography.rst
index c13c676..ba3bc87 100644
--- a/docs/source/bibliography.rst
+++ b/docs/source/bibliography.rst
@@ -4,6 +4,20 @@
 Bibliography
 ============
 
+Background Reading
+==================
+
+For a short but informative introduction to the subject we recommend
+the booklet by [Madsen]_ . For a general introduction to non-linear
+optimization we recommend [NocedalWright]_. [Bjorck]_ remains the
+seminal reference on least squares problems. [TrefethenBau]_ is our
+favorite text on introductory numerical linear algebra. [Triggs]_
+provides a thorough coverage of the bundle adjustment problem.
+
+
+References
+==========
+
 .. [Agarwal] S. Agarwal, N. Snavely, S. M. Seitz and R. Szeliski,
    **Bundle Adjustment in the Large**, *Proceedings of the European
    Conference on Computer Vision*, pp. 29--42, 2010.
@@ -31,6 +45,9 @@
 .. [Conn] A.R. Conn, N.I.M. Gould, and P.L. Toint, **Trust region
    methods**, *Society for Industrial Mathematics*, 2000.
 
+.. [Davis] Timothy A. Davis, **Direct methods for Sparse Linear
+   Systems**, *SIAM*, 2006.
+
 .. [Dellaert] F. Dellaert, J. Carlson, V. Ila, K. Ni and C. E. Thorpe,
    **Subgraph-preconditioned conjugate gradients for large scale SLAM**,
    *International Conference on Intelligent Robots and Systems*, 2010.
@@ -44,7 +61,7 @@
    Preconditioners for Sparse Linear Least-Squares Problems**,
    *ACM Trans. Math. Softw.*, 43(4), 2017.
 
-.. [HartleyZisserman] R.I. Hartley & A. Zisserman, **Multiview
+.. [HartleyZisserman] R.I. Hartley and A. Zisserman, **Multiview
    Geometry in Computer Vision**, Cambridge University Press, 2004.
 
 .. [Hertzberg] C. Hertzberg, R. Wagner, U. Frese and L. Schroder,
@@ -79,6 +96,11 @@
    preconditioner for large sparse least squares problems**, *SIAM
    Journal on Matrix Analysis and Applications*, 28(2):524-550, 2007.
 
+.. [LourakisArgyros] M. L. A. Lourakis, A. A. Argyros, **Is
+   Levenberg-Marquardt the most efficient algorithm for implementing
+   bundle adjustment?**, *International Conference on Computer
+   Vision*, 2005.
+
 .. [Madsen] K. Madsen, H.B. Nielsen, and O. Tingleff, **Methods for
    nonlinear least squares problems**, 2004.
 
@@ -100,7 +122,7 @@
 .. [Nocedal] J. Nocedal, **Updating Quasi-Newton Matrices with Limited
    Storage**, *Mathematics of Computation*, 35(151): 773--782, 1980.
 
-.. [NocedalWright] J. Nocedal & S. Wright, **Numerical Optimization**,
+.. [NocedalWright] J. Nocedal and S. Wright, **Numerical Optimization**,
    Springer, 2004.
 
 .. [Oren] S. S. Oren, **Self-scaling Variable Metric (SSVM) Algorithms
@@ -108,7 +130,7 @@
    20(5), 863-874, 1974.
 
 .. [Press] W. H. Press, S. A. Teukolsky, W. T. Vetterling
-   & B. P. Flannery, **Numerical Recipes**, Cambridge University
+   and B. P. Flannery, **Numerical Recipes**, Cambridge University
    Press, 2007.
 
 .. [Ridders] C. J. F. Ridders, **Accurate computation of F'(x) and
@@ -122,27 +144,37 @@
    systems**, SIAM, 2003.
 
 .. [Simon] I. Simon, N. Snavely and S. M. Seitz, **Scene Summarization
-   for Online Image Collections**, *International Conference on Computer Vision*, 2007.
+   for Online Image Collections**, *International Conference on
+   Computer Vision*, 2007.
 
 .. [Stigler] S. M. Stigler, **Gauss and the invention of least
    squares**, *The Annals of Statistics*, 9(3):465-474, 1981.
 
-.. [TenenbaumDirector] J. Tenenbaum & B. Director, **How Gauss
+.. [TenenbaumDirector] J. Tenenbaum and B. Director, **How Gauss
    Determined the Orbit of Ceres**.
 
 .. [TrefethenBau] L.N. Trefethen and D. Bau, **Numerical Linear
    Algebra**, SIAM, 1997.
 
-.. [Triggs] B. Triggs, P. F. Mclauchlan, R. I. Hartley &
+.. [Triggs] B. Triggs, P. F. Mclauchlan, R. I. Hartley and
    A. W. Fitzgibbon, **Bundle Adjustment: A Modern Synthesis**,
    Proceedings of the International Workshop on Vision Algorithms:
    Theory and Practice, pp. 298-372, 1999.
 
+.. [Weber] S. Weber, N. Demmel, TC Chan, D. Cremers, **Power Bundle
+   Adjustment for Large-Scale 3D Reconstruction**, *IEEE Conference on
+   Computer Vision and Pattern Recognition*, 2023.
+
 .. [Wiberg] T. Wiberg, **Computation of principal components when data
    are missing**, In Proc. *Second Symp. Computational Statistics*,
    pages 229-236, 1976.
 
-.. [WrightHolt] S. J. Wright and J. N. Holt, **An Inexact
-   Levenberg Marquardt Method for Large Sparse Nonlinear Least
-   Squares**, *Journal of the Australian Mathematical Society Series
-   B*, 26(4):387-403, 1985.
+.. [WrightHolt] S. J. Wright and J. N. Holt, **An Inexact Levenberg
+   Marquardt Method for Large Sparse Nonlinear Least Squares**,
+   *Journal of the Australian Mathematical Society Series B*,
+   26(4):387-403, 1985.
+
+.. [Zheng] Q. Zheng, Y. Xi and Y. Saad, **A power Schur Complement
+   low-rank correction preconditioner for general sparse linear
+   systems**, *SIAM Journal on Matrix Analysis and
+   Applications*, 2021.
diff --git a/docs/source/conf.py b/docs/source/conf.py
index c83468f..faa2403 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -25,7 +25,7 @@
 
 # Add any Sphinx extension module names here, as strings. They can be extensions
 # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.ifconfig']
+extensions = ['sphinx.ext.todo', 'sphinx.ext.mathjax', 'sphinx.ext.ifconfig', 'sphinxcontrib.jquery']
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -41,16 +41,16 @@
 
 # General information about the project.
 project = u'Ceres Solver'
-copyright = u'2020 Google Inc'
+copyright = u'2023 Google Inc'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
 # built documents.
 #
 # The short X.Y version.
-version = '2.0'
+version = '2.2'
 # The full version, including alpha/beta/rc tags.
-release = '2.0.0'
+release = '2.2.0'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -246,7 +246,7 @@
 # By default MathJax does not use TeX fonts, which is a tragedy. Also
 # scaling the fonts down a bit makes them fit better with font sizing
 # in the "Read The Docs" theme.
-mathjax_config = {
+mathjax3_config = {
 'HTML-CSS' : {
         'availableFonts' : ["TeX"],
         'scale' : 90
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
index a128e30..cba274c 100644
--- a/docs/source/contributing.rst
+++ b/docs/source/contributing.rst
@@ -118,8 +118,8 @@
 
    When the push succeeds, the console will display a URL showing the
    address of the review. Go to the URL and add at least one of the
-   maintainers (Sameer Agarwal, Keir Mierle, Alex Stewart or William
-   Rucklidge) as reviewers.
+   maintainers (Sameer Agarwal, Keir Mierle, Alex Stewart, William
+   Rucklidge or Sergiu Deitsch) as reviewers.
 
 3. Wait for a review.
 
diff --git a/docs/source/derivatives.rst b/docs/source/derivatives.rst
index bff6a29..d9a52b0 100644
--- a/docs/source/derivatives.rst
+++ b/docs/source/derivatives.rst
@@ -58,3 +58,4 @@
    numerical_derivatives
    automatic_derivatives
    interfacing_with_autodiff
+   inverse_and_implicit_function_theorems
diff --git a/docs/source/faqs.rst b/docs/source/faqs.rst
index 5a28f41..65c64e6 100644
--- a/docs/source/faqs.rst
+++ b/docs/source/faqs.rst
@@ -16,14 +16,3 @@
 
    modeling_faqs
    solving_faqs
-
-
-Further Reading
-===============
-
-For a short but informative introduction to the subject we recommend
-the booklet by [Madsen]_ . For a general introduction to non-linear
-optimization we recommend [NocedalWright]_. [Bjorck]_ remains the
-seminal reference on least squares problems. [TrefethenBau]_ book is
-our favorite text on introductory numerical linear algebra. [Triggs]_
-provides a thorough coverage of the bundle adjustment problem.
diff --git a/docs/source/features.rst b/docs/source/features.rst
index 724d6dc..609f41c 100644
--- a/docs/source/features.rst
+++ b/docs/source/features.rst
@@ -1,11 +1,15 @@
+.. default-domain:: cpp
+
+.. cpp:namespace:: ceres
+
 ====
 Why?
 ====
 .. _chapter-features:
 
 * **Code Quality** - Ceres Solver has been used in production at
-  Google for more than four years now. It is clean, extensively tested
-  and well documented code that is actively developed and supported.
+  Google since 2011. It is clean, extensively tested and well
+  documented code that is actively developed and supported.
 
 * **Modeling API** - It is rarely the case that one starts with the
   exact and complete formulation of the problem that one is trying to
@@ -27,10 +31,10 @@
     allows the user to *shape* their residuals using a
     :class:`LossFunction` to reduce the influence of outliers.
 
-  - **Local Parameterization** In many cases, some parameters lie on a
-    manifold other than Euclidean space, e.g., rotation matrices. In
-    such cases, the user can specify the geometry of the local tangent
-    space by specifying a :class:`LocalParameterization` object.
+  - **Manifolds** In many cases, some parameters lie on a manifold
+    other than Euclidean space, e.g., rotation matrices. In such
+    cases, the user can specify the geometry of the local tangent
+    space by specifying a :class:`Manifold` object.
 
 * **Solver Choice** Depending on the size, sparsity structure, time &
   memory budgets, and solution quality requirements, different
@@ -42,10 +46,11 @@
     computational cost in all of these methods is the solution of a
     linear system. To this end Ceres ships with a variety of linear
     solvers - dense QR and dense Cholesky factorization (using
-    `Eigen`_ or `LAPACK`_) for dense problems, sparse Cholesky
-    factorization (`SuiteSparse`_, `CXSparse`_ or `Eigen`_) for large
-    sparse problems, custom Schur complement based dense, sparse, and
-    iterative linear solvers for `bundle adjustment`_ problems.
+    `Eigen`_, `LAPACK`_ or `CUDA`_) for dense problems, sparse
+    Cholesky factorization (`SuiteSparse`_, `Apple's Accelerate`_,
+    `Eigen`_) for large sparse problems, custom Schur complement based
+    dense, sparse, and iterative linear solvers for `bundle
+    adjustment`_ problems.
 
   - **Line Search Solvers** - When the problem size is so large that
     storing and factoring the Jacobian is not feasible or a low
@@ -54,18 +59,21 @@
     of Non-linear Conjugate Gradients, BFGS and LBFGS.
 
 * **Speed** - Ceres Solver has been extensively optimized, with C++
-  templating, hand written linear algebra routines and OpenMP or
-  modern C++ threads based multithreading of the Jacobian evaluation
-  and the linear solvers.
+  templating, hand written linear algebra routines and modern C++
+  threads based multithreading of the Jacobian evaluation and the
+  linear solvers.
 
-* **Solution Quality** Ceres is the `best performing`_ solver on the NIST
-  problem set used by Mondragon and Borchers for benchmarking
+* **GPU Acceleration** If your system supports `CUDA`_ then Ceres
+  Solver can use the Nvidia GPU on your system to speed up the solver.
+
+* **Solution Quality** Ceres is the `best performing`_ solver on the
+  NIST problem set used by Mondragon and Borchers for benchmarking
   non-linear least squares solvers.
 
 * **Covariance estimation** - Evaluate the sensitivity/uncertainty of
   the solution by evaluating all or part of the covariance
-  matrix. Ceres is one of the few solvers that allows you to do
-  this analysis at scale.
+  matrix. Ceres is one of the few solvers that allows you to do this
+  analysis at scale.
 
 * **Community** Since its release as an open source software, Ceres
   has developed an active developer community that contributes new
@@ -82,6 +90,7 @@
 .. _SuiteSparse: http://www.cise.ufl.edu/research/sparse/SuiteSparse/
 .. _Eigen: http://eigen.tuxfamily.org/
 .. _LAPACK: http://www.netlib.org/lapack/
-.. _CXSparse: https://www.cise.ufl.edu/research/sparse/CXSparse/
 .. _automatic: http://en.wikipedia.org/wiki/Automatic_differentiation
 .. _numeric: http://en.wikipedia.org/wiki/Numerical_differentiation
+.. _CUDA : https://developer.nvidia.com/cuda-toolkit
+.. _Apple's Accelerate: https://developer.apple.com/documentation/accelerate/sparse_solvers
diff --git a/docs/source/gradient_solver.rst b/docs/source/gradient_solver.rst
index dde9d7e..4e3fc71 100644
--- a/docs/source/gradient_solver.rst
+++ b/docs/source/gradient_solver.rst
@@ -2,6 +2,8 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-gradient_problem_solver:
 
 ==================================
@@ -54,9 +56,9 @@
    public:
     explicit GradientProblem(FirstOrderFunction* function);
     GradientProblem(FirstOrderFunction* function,
-                    LocalParameterization* parameterization);
+                    Manifold* manifold);
     int NumParameters() const;
-    int NumLocalParameters() const;
+    int NumTangentParameters() const;
     bool Evaluate(const double* parameters, double* cost, double* gradient) const;
     bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
   };
@@ -69,20 +71,18 @@
 form of the objective function.
 
 Structurally :class:`GradientProblem` is a composition of a
-:class:`FirstOrderFunction` and optionally a
-:class:`LocalParameterization`.
+:class:`FirstOrderFunction` and optionally a :class:`Manifold`.
 
 The :class:`FirstOrderFunction` is responsible for evaluating the cost
 and gradient of the objective function.
 
-The :class:`LocalParameterization` is responsible for going back and
-forth between the ambient space and the local tangent space. When a
-:class:`LocalParameterization` is not provided, then the tangent space
-is assumed to coincide with the ambient Euclidean space that the
-gradient vector lives in.
+The :class:`Manifold` is responsible for going back and forth between the
+ambient space and the local tangent space. When a :class:`Manifold` is not
+provided, then the tangent space is assumed to coincide with the ambient
+Euclidean space that the gradient vector lives in.
 
 The constructor takes ownership of the :class:`FirstOrderFunction` and
-:class:`LocalParamterization` objects passed to it.
+:class:`Manifold` objects passed to it.
 
 
 .. function:: void Solve(const GradientProblemSolver::Options& options, const GradientProblem& problem, double* parameters, GradientProblemSolver::Summary* summary)
@@ -103,7 +103,7 @@
    behavior of the solver. We list the various settings and their
    default values below.
 
-.. function:: bool GradientProblemSolver::Options::IsValid(string* error) const
+.. function:: bool GradientProblemSolver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -123,7 +123,7 @@
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
    ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   satisfied, the ``WOLFE`` line search should be used.
 
 .. member:: NonlinearConjugateGradientType GradientProblemSolver::Options::nonlinear_conjugate_gradient_type
 
@@ -192,7 +192,7 @@
   low-sensitivity parameters. It can also reduce the robustness of the
   solution to errors in the Jacobians.
 
-.. member:: LineSearchIterpolationType GradientProblemSolver::Options::line_search_interpolation_type
+.. member:: LineSearchInterpolationType GradientProblemSolver::Options::line_search_interpolation_type
 
    Default: ``CUBIC``
 
@@ -342,8 +342,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the manifold associated with the parameter
+   vector.
 
 .. member:: double GradientProblemSolver::Options::parameter_tolerance
 
@@ -388,14 +388,14 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<IterationCallback> GradientProblemSolver::Options::callbacks
+.. member:: std::vector<IterationCallback> GradientProblemSolver::Options::callbacks
 
    Callbacks that are executed at the end of each iteration of the
    :class:`Minimizer`. They are executed in the order that they are
    specified in this vector. By default, parameter blocks are updated
    only at the end of the optimization, i.e., when the
    :class:`Minimizer` terminates. This behavior is controlled by
-   :member:`GradientProblemSolver::Options::update_state_every_variable`. If
+   :member:`GradientProblemSolver::Options::update_state_every_iteration`. If
    the user wishes to have access to the update parameter blocks when
    his/her callbacks are executed, then set
    :member:`GradientProblemSolver::Options::update_state_every_iteration`
@@ -404,7 +404,7 @@
    The solver does NOT take ownership of these pointers.
 
 
-.. member:: bool Solver::Options::update_state_every_iteration
+.. member:: bool GradientProblemSolver::Options::update_state_every_iteration
 
    Default: ``false``
 
@@ -420,12 +420,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string GradientProblemSolver::Summary::BriefReport() const
+.. function:: std::string GradientProblemSolver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string GradientProblemSolver::Summary::FullReport() const
+.. function:: std::string GradientProblemSolver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -444,7 +444,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string GradientProblemSolver::Summary::message
+.. member:: std::string GradientProblemSolver::Summary::message
 
    Reason why the solver terminated.
 
@@ -458,7 +458,7 @@
    Cost of the problem (value of the objective function) after the
    optimization.
 
-.. member:: vector<IterationSummary> GradientProblemSolver::Summary::iterations
+.. member:: std::vector<IterationSummary> GradientProblemSolver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
@@ -486,11 +486,11 @@
 
    Number of parameters in the problem.
 
-.. member:: int GradientProblemSolver::Summary::num_local_parameters
+.. member:: int GradientProblemSolver::Summary::num_tangent_parameters
 
    Dimension of the tangent space of the problem. This is different
    from :member:`GradientProblemSolver::Summary::num_parameters` if a
-   :class:`LocalParameterization` object is used.
+   :class:`Manifold` object is used.
 
 .. member:: LineSearchDirectionType GradientProblemSolver::Summary::line_search_direction_type
 
diff --git a/docs/source/gradient_tutorial.rst b/docs/source/gradient_tutorial.rst
index 3fef6b6..2af44e1 100644
--- a/docs/source/gradient_tutorial.rst
+++ b/docs/source/gradient_tutorial.rst
@@ -2,52 +2,53 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-gradient_tutorial:
 
 ==================================
 General Unconstrained Minimization
 ==================================
 
-While much of Ceres Solver is devoted to solving non-linear least
-squares problems, internally it contains a solver that can solve
-general unconstrained optimization problems using just their objective
-function value and gradients. The ``GradientProblem`` and
-``GradientProblemSolver`` objects give the user access to this solver.
-
-So without much further ado, let us look at how one goes about using
-them.
+Ceres Solver besides being able to solve non-linear least squares
+problem can also solve general unconstrained problems using just their
+objective function value and gradients. In this chapter we will see
+how to do this.
 
 Rosenbrock's Function
 =====================
 
-We consider the minimization of the famous `Rosenbrock's function
+Consider minimizing the famous `Rosenbrock's function
 <http://en.wikipedia.org/wiki/Rosenbrock_function>`_ [#f1]_.
 
-We begin by defining an instance of the ``FirstOrderFunction``
-interface. This is the object that is responsible for computing the
-objective function value and the gradient (if required). This is the
-analog of the :class:`CostFunction` when defining non-linear least
-squares problems in Ceres.
+The simplest way to minimize is to define a templated functor to
+evaluate the objective value of this function and then use Ceres
+Solver's automatic differentiation to compute its derivatives.
+
+We begin by defining a templated functor and then using
+``AutoDiffFirstOrderFunction`` to construct an instance of the
+``FirstOrderFunction`` interface. This is the object that is
+responsible for computing the objective function value and the
+gradient (if required). This is the analog of the
+:class:`CostFunction` when defining non-linear least squares problems
+in Ceres.
 
 .. code::
 
-  class Rosenbrock : public ceres::FirstOrderFunction {
-   public:
-    virtual bool Evaluate(const double* parameters,
-                          double* cost,
-                          double* gradient) const {
-      const double x = parameters[0];
-      const double y = parameters[1];
-
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  struct Rosenbrock {
+    template <typename T>
+    bool operator()(const T* parameters, T* cost) const {
+      const T x = parameters[0];
+      const T y = parameters[1];
       cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
-      if (gradient != nullptr) {
-        gradient[0] = -2.0 * (1.0 - x) - 200.0 * (y - x * x) * 2.0 * x;
-        gradient[1] = 200.0 * (y - x * x);
-      }
       return true;
     }
 
-    virtual int NumParameters() const { return 2; }
+    static ceres::FirstOrderFunction* Create() {
+      constexpr int kNumParameters = 2;
+      return new ceres::AutoDiffFirstOrderFunction<Rosenbrock, kNumParameters>();
+    }
   };
 
 
@@ -58,7 +59,7 @@
 
     double parameters[2] = {-1.2, 1.0};
 
-    ceres::GradientProblem problem(new Rosenbrock());
+    ceres::GradientProblem problem(Rosenbrock::Create());
 
     ceres::GradientProblemSolver::Options options;
     options.minimizer_progress_to_stdout = true;
@@ -74,65 +75,130 @@
 
 .. code-block:: bash
 
-     0: f: 2.420000e+01 d: 0.00e+00 g: 2.16e+02 h: 0.00e+00 s: 0.00e+00 e:  0 it: 2.00e-05 tt: 2.00e-05
-     1: f: 4.280493e+00 d: 1.99e+01 g: 1.52e+01 h: 2.01e-01 s: 8.62e-04 e:  2 it: 7.32e-05 tt: 2.19e-04
-     2: f: 3.571154e+00 d: 7.09e-01 g: 1.35e+01 h: 3.78e-01 s: 1.34e-01 e:  3 it: 2.50e-05 tt: 2.68e-04
-     3: f: 3.440869e+00 d: 1.30e-01 g: 1.73e+01 h: 1.36e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 2.92e-04
-     4: f: 3.213597e+00 d: 2.27e-01 g: 1.55e+01 h: 1.06e-01 s: 4.59e-01 e:  1 it: 2.86e-06 tt: 3.14e-04
-     5: f: 2.839723e+00 d: 3.74e-01 g: 1.05e+01 h: 1.34e-01 s: 5.24e-01 e:  1 it: 2.86e-06 tt: 3.36e-04
-     6: f: 2.448490e+00 d: 3.91e-01 g: 1.29e+01 h: 3.04e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 3.58e-04
-     7: f: 1.943019e+00 d: 5.05e-01 g: 4.00e+00 h: 8.81e-02 s: 7.43e-01 e:  1 it: 4.05e-06 tt: 3.79e-04
-     8: f: 1.731469e+00 d: 2.12e-01 g: 7.36e+00 h: 1.71e-01 s: 4.60e-01 e:  2 it: 9.06e-06 tt: 4.06e-04
-     9: f: 1.503267e+00 d: 2.28e-01 g: 6.47e+00 h: 8.66e-02 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 4.33e-04
-    10: f: 1.228331e+00 d: 2.75e-01 g: 2.00e+00 h: 7.70e-02 s: 7.90e-01 e:  1 it: 3.81e-06 tt: 4.54e-04
-    11: f: 1.016523e+00 d: 2.12e-01 g: 5.15e+00 h: 1.39e-01 s: 3.76e-01 e:  2 it: 1.00e-05 tt: 4.82e-04
-    12: f: 9.145773e-01 d: 1.02e-01 g: 6.74e+00 h: 7.98e-02 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 5.03e-04
-    13: f: 7.508302e-01 d: 1.64e-01 g: 3.88e+00 h: 5.76e-02 s: 4.93e-01 e:  1 it: 2.86e-06 tt: 5.25e-04
-    14: f: 5.832378e-01 d: 1.68e-01 g: 5.56e+00 h: 1.42e-01 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 5.47e-04
-    15: f: 3.969581e-01 d: 1.86e-01 g: 1.64e+00 h: 1.17e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 5.68e-04
-    16: f: 3.171557e-01 d: 7.98e-02 g: 3.84e+00 h: 1.18e-01 s: 3.97e-01 e:  2 it: 9.06e-06 tt: 5.94e-04
-    17: f: 2.641257e-01 d: 5.30e-02 g: 3.27e+00 h: 6.14e-02 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 6.16e-04
-    18: f: 1.909730e-01 d: 7.32e-02 g: 5.29e-01 h: 8.55e-02 s: 6.82e-01 e:  1 it: 4.05e-06 tt: 6.42e-04
-    19: f: 1.472012e-01 d: 4.38e-02 g: 3.11e+00 h: 1.20e-01 s: 3.47e-01 e:  2 it: 1.00e-05 tt: 6.69e-04
-    20: f: 1.093558e-01 d: 3.78e-02 g: 2.97e+00 h: 8.43e-02 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 6.91e-04
-    21: f: 6.710346e-02 d: 4.23e-02 g: 1.42e+00 h: 9.64e-02 s: 8.85e-01 e:  1 it: 3.81e-06 tt: 7.12e-04
-    22: f: 3.993377e-02 d: 2.72e-02 g: 2.30e+00 h: 1.29e-01 s: 4.63e-01 e:  2 it: 9.06e-06 tt: 7.39e-04
-    23: f: 2.911794e-02 d: 1.08e-02 g: 2.55e+00 h: 6.55e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 7.62e-04
-    24: f: 1.457683e-02 d: 1.45e-02 g: 2.77e-01 h: 6.37e-02 s: 6.14e-01 e:  1 it: 3.81e-06 tt: 7.84e-04
-    25: f: 8.577515e-03 d: 6.00e-03 g: 2.86e+00 h: 1.40e-01 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.05e-04
-    26: f: 3.486574e-03 d: 5.09e-03 g: 1.76e-01 h: 1.23e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.27e-04
-    27: f: 1.257570e-03 d: 2.23e-03 g: 1.39e-01 h: 5.08e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.48e-04
-    28: f: 2.783568e-04 d: 9.79e-04 g: 6.20e-01 h: 6.47e-02 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 8.69e-04
-    29: f: 2.533399e-05 d: 2.53e-04 g: 1.68e-02 h: 1.98e-03 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 8.91e-04
-    30: f: 7.591572e-07 d: 2.46e-05 g: 5.40e-03 h: 9.27e-03 s: 1.00e+00 e:  1 it: 3.81e-06 tt: 9.12e-04
-    31: f: 1.902460e-09 d: 7.57e-07 g: 1.62e-03 h: 1.89e-03 s: 1.00e+00 e:  1 it: 2.86e-06 tt: 9.33e-04
-    32: f: 1.003030e-12 d: 1.90e-09 g: 3.50e-05 h: 3.52e-05 s: 1.00e+00 e:  1 it: 3.10e-06 tt: 9.54e-04
-    33: f: 4.835994e-17 d: 1.00e-12 g: 1.05e-07 h: 1.13e-06 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 9.81e-04
-    34: f: 1.885250e-22 d: 4.84e-17 g: 2.69e-10 h: 1.45e-08 s: 1.00e+00 e:  1 it: 4.05e-06 tt: 1.00e-03
+       0: f: 2.420000e+01 d: 0.00e+00 g: 2.16e+02 h: 0.00e+00 s: 0.00e+00 e:  0 it: 1.19e-05 tt: 1.19e-05
+       1: f: 4.280493e+00 d: 1.99e+01 g: 1.52e+01 h: 2.01e-01 s: 8.62e-04 e:  2 it: 7.30e-05 tt: 1.72e-04
+       2: f: 3.571154e+00 d: 7.09e-01 g: 1.35e+01 h: 3.78e-01 s: 1.34e-01 e:  3 it: 1.60e-05 tt: 1.93e-04
+       3: f: 3.440869e+00 d: 1.30e-01 g: 1.73e+01 h: 1.36e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 1.97e-04
+       4: f: 3.213597e+00 d: 2.27e-01 g: 1.55e+01 h: 1.06e-01 s: 4.59e-01 e:  1 it: 1.19e-06 tt: 2.00e-04
+       5: f: 2.839723e+00 d: 3.74e-01 g: 1.05e+01 h: 1.34e-01 s: 5.24e-01 e:  1 it: 9.54e-07 tt: 2.03e-04
+       6: f: 2.448490e+00 d: 3.91e-01 g: 1.29e+01 h: 3.04e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.05e-04
+       7: f: 1.943019e+00 d: 5.05e-01 g: 4.00e+00 h: 8.81e-02 s: 7.43e-01 e:  1 it: 9.54e-07 tt: 2.08e-04
+       8: f: 1.731469e+00 d: 2.12e-01 g: 7.36e+00 h: 1.71e-01 s: 4.60e-01 e:  2 it: 2.15e-06 tt: 2.11e-04
+       9: f: 1.503267e+00 d: 2.28e-01 g: 6.47e+00 h: 8.66e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.14e-04
+      10: f: 1.228331e+00 d: 2.75e-01 g: 2.00e+00 h: 7.70e-02 s: 7.90e-01 e:  1 it: 0.00e+00 tt: 2.16e-04
+      11: f: 1.016523e+00 d: 2.12e-01 g: 5.15e+00 h: 1.39e-01 s: 3.76e-01 e:  2 it: 1.91e-06 tt: 2.25e-04
+      12: f: 9.145773e-01 d: 1.02e-01 g: 6.74e+00 h: 7.98e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.28e-04
+      13: f: 7.508302e-01 d: 1.64e-01 g: 3.88e+00 h: 5.76e-02 s: 4.93e-01 e:  1 it: 9.54e-07 tt: 2.30e-04
+      14: f: 5.832378e-01 d: 1.68e-01 g: 5.56e+00 h: 1.42e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 2.33e-04
+      15: f: 3.969581e-01 d: 1.86e-01 g: 1.64e+00 h: 1.17e-01 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 2.36e-04
+      16: f: 3.171557e-01 d: 7.98e-02 g: 3.84e+00 h: 1.18e-01 s: 3.97e-01 e:  2 it: 1.91e-06 tt: 2.39e-04
+      17: f: 2.641257e-01 d: 5.30e-02 g: 3.27e+00 h: 6.14e-02 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 2.42e-04
+      18: f: 1.909730e-01 d: 7.32e-02 g: 5.29e-01 h: 8.55e-02 s: 6.82e-01 e:  1 it: 9.54e-07 tt: 2.45e-04
+      19: f: 1.472012e-01 d: 4.38e-02 g: 3.11e+00 h: 1.20e-01 s: 3.47e-01 e:  2 it: 1.91e-06 tt: 2.49e-04
+      20: f: 1.093558e-01 d: 3.78e-02 g: 2.97e+00 h: 8.43e-02 s: 1.00e+00 e:  1 it: 2.15e-06 tt: 2.52e-04
+      21: f: 6.710346e-02 d: 4.23e-02 g: 1.42e+00 h: 9.64e-02 s: 8.85e-01 e:  1 it: 8.82e-06 tt: 2.81e-04
+      22: f: 3.993377e-02 d: 2.72e-02 g: 2.30e+00 h: 1.29e-01 s: 4.63e-01 e:  2 it: 7.87e-06 tt: 2.96e-04
+      23: f: 2.911794e-02 d: 1.08e-02 g: 2.55e+00 h: 6.55e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.00e-04
+      24: f: 1.457683e-02 d: 1.45e-02 g: 2.77e-01 h: 6.37e-02 s: 6.14e-01 e:  1 it: 1.19e-06 tt: 3.03e-04
+      25: f: 8.577515e-03 d: 6.00e-03 g: 2.86e+00 h: 1.40e-01 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.06e-04
+      26: f: 3.486574e-03 d: 5.09e-03 g: 1.76e-01 h: 1.23e-02 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 3.09e-04
+      27: f: 1.257570e-03 d: 2.23e-03 g: 1.39e-01 h: 5.08e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.12e-04
+      28: f: 2.783568e-04 d: 9.79e-04 g: 6.20e-01 h: 6.47e-02 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.15e-04
+      29: f: 2.533399e-05 d: 2.53e-04 g: 1.68e-02 h: 1.98e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.17e-04
+      30: f: 7.591572e-07 d: 2.46e-05 g: 5.40e-03 h: 9.27e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.20e-04
+      31: f: 1.902460e-09 d: 7.57e-07 g: 1.62e-03 h: 1.89e-03 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.23e-04
+      32: f: 1.003030e-12 d: 1.90e-09 g: 3.50e-05 h: 3.52e-05 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.26e-04
+      33: f: 4.835994e-17 d: 1.00e-12 g: 1.05e-07 h: 1.13e-06 s: 1.00e+00 e:  1 it: 1.19e-06 tt: 3.34e-04
+      34: f: 1.885250e-22 d: 4.84e-17 g: 2.69e-10 h: 1.45e-08 s: 1.00e+00 e:  1 it: 9.54e-07 tt: 3.37e-04
 
-  Solver Summary (v 1.12.0-lapack-suitesparse-cxsparse-no_openmp)
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
 
-  Parameters                                  2
-  Line search direction              LBFGS (20)
-  Line search type                  CUBIC WOLFE
+    Parameters                                  2
+    Line search direction              LBFGS (20)
+    Line search type                  CUBIC WOLFE
 
 
-  Cost:
-  Initial                          2.420000e+01
-  Final                            1.885250e-22
-  Change                           2.420000e+01
+    Cost:
+    Initial                          2.420000e+01
+    Final                            1.955192e-27
+    Change                           2.420000e+01
 
-  Minimizer iterations                       35
+    Minimizer iterations                       36
 
-  Time (in seconds):
+    Time (in seconds):
 
-    Cost evaluation                       0.000
-    Gradient evaluation                   0.000
-  Total                                   0.003
+      Cost evaluation                    0.000000 (0)
+      Gradient & cost evaluation         0.000000 (44)
+      Polynomial minimization            0.000061
+    Total                                0.000438
 
-  Termination:                      CONVERGENCE (Gradient tolerance reached. Gradient max norm: 9.032775e-13 <= 1.000000e-10)
+    Termination:                      CONVERGENCE (Parameter tolerance reached. Relative step_norm: 1.890726e-11 <= 1.000000e-08.)
+
+    Initial x: -1.2 y: 1
+    Final   x: 1 y: 1
+
+
+
+
+If you are unable to use automatic differentiation for some reason
+(say because you need to call an external library), then you can
+use numeric differentiation. In that case the functor is defined as
+follows [#f2]_.
+
+.. code::
+
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  struct Rosenbrock {
+    bool operator()(const double* parameters, double* cost) const {
+      const double x = parameters[0];
+      const double y = parameters[1];
+      cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
+      return true;
+    }
+
+    static ceres::FirstOrderFunction* Create() {
+      constexpr int kNumParameters = 2;
+      return new ceres::NumericDiffFirstOrderFunction<Rosenbrock,
+                                                      ceres::CENTRAL,
+                                                      kNumParameters>();
+    }
+  };
+
+And finally, if you would rather compute the derivatives by hand (say
+because the size of the parameter vector is too large to be
+automatically differentiated). Then you should define an instance of
+`FirstOrderFunction`, which is the analog of :class:`CostFunction` for
+non-linear least squares problems [#f3]_.
+
+.. code::
+
+  // f(x,y) = (1-x)^2 + 100(y - x^2)^2;
+  class Rosenbrock final  : public ceres::FirstOrderFunction {
+    public:
+      bool Evaluate(const double* parameters,
+                             double* cost,
+                             double* gradient) const override {
+         const double x = parameters[0];
+         const double y = parameters[1];
+
+         cost[0] = (1.0 - x) * (1.0 - x) + 100.0 * (y - x * x) * (y - x * x);
+         if (gradient) {
+           gradient[0] = -2.0 * (1.0 - x) - 200.0 * (y - x * x) * 2.0 * x;
+           gradient[1] = 200.0 * (y - x * x);
+         }
+        return true;
+     }
+
+     int NumParameters() const override { return 2; }
+  };
 
 .. rubric:: Footnotes
 
 .. [#f1] `examples/rosenbrock.cc
    <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock.cc>`_
+
+.. [#f2] `examples/rosenbrock_numeric_diff.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock_numeric_diff.cc>`_
+
+.. [#f3] `examples/rosenbrock_analytic_diff.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/rosenbrock_analytic_diff.cc>`_
diff --git a/docs/source/index.rst b/docs/source/index.rst
index d72368f..497b12c 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -44,12 +44,15 @@
 
 If you use Ceres Solver for a publication, please cite it as::
 
-    @misc{ceres-solver,
-      author = "Sameer Agarwal and Keir Mierle and Others",
-      title = "Ceres Solver",
-      howpublished = "\url{http://ceres-solver.org}",
-    }
-
+  @software{Agarwal_Ceres_Solver_2022,
+    author = {Agarwal, Sameer and Mierle, Keir and The Ceres Solver Team},
+    title = {{Ceres Solver}},
+    license = {Apache-2.0},
+    url = {https://github.com/ceres-solver/ceres-solver},
+    version = {2.2},
+    year = {2023},
+    month = {10}
+  }
 
 .. rubric:: Footnotes
 
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
index 7f49783..4feb1a4 100644
--- a/docs/source/installation.rst
+++ b/docs/source/installation.rst
@@ -9,7 +9,7 @@
 .. _section-source:
 
 You can start with the `latest stable release
-<http://ceres-solver.org/ceres-solver-2.0.0.tar.gz>`_ . Or if you want
+<http://ceres-solver.org/ceres-solver-2.2.0.tar.gz>`_ . Or if you want
 the latest version, you can clone the git repository
 
 .. code-block:: bash
@@ -21,15 +21,16 @@
 Dependencies
 ============
 
-  .. NOTE ::
+ .. note ::
 
-    Starting with v2.0 Ceres requires a **fully C++14-compliant**
-    compiler.  In versions <= 1.14, C++11 was an optional requirement.
+    Ceres Solver 2.2 requires a **fully C++17-compliant** compiler.
 
 Ceres relies on a number of open source libraries, some of which are
 optional. For details on customizing the build process, see
 :ref:`section-customizing` .
 
+- `CMake <http://www.cmake.org>`_ 3.16 or later **required**.
+
 - `Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
   3.3 or later **required**.
 
@@ -39,9 +40,7 @@
     library. Please see the documentation for ``EIGENSPARSE`` for
     more details.
 
-- `CMake <http://www.cmake.org>`_ 3.5 or later **required**.
-
-- `glog <https://github.com/google/glog>`_ 0.3.1 or
+- `glog <https://github.com/google/glog>`_ 0.3.5 or
   later. **Recommended**
 
   ``glog`` is used extensively throughout Ceres for logging detailed
@@ -65,22 +64,13 @@
   recommend against it. ``miniglog`` has worse performance than
   ``glog`` and is much harder to control and use.
 
-  .. NOTE ::
-
-     If you are compiling ``glog`` from source, please note that
-     currently, the unit tests for ``glog`` (which are enabled by
-     default) do not compile against a default build of ``gflags`` 2.1
-     as the gflags namespace changed from ``google::`` to
-     ``gflags::``.  A patch to fix this is available from `here
-     <https://code.google.com/p/google-glog/issues/detail?id=194>`_.
-
 - `gflags <https://github.com/gflags/gflags>`_. Needed to build
   examples and tests and usually a dependency for glog.
 
-- `SuiteSparse
-  <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_. Needed for
-  solving large sparse linear systems. **Optional; strongly recomended
-  for large scale bundle adjustment**
+- `SuiteSparse <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_
+  4.5.6 or later. Needed for solving large sparse linear
+  systems. **Optional; strongly recommended for large scale bundle
+  adjustment**
 
   .. NOTE ::
 
@@ -90,10 +80,10 @@
      found TBB version. You can customize the searched TBB location
      with the ``TBB_ROOT`` variable.
 
-- `CXSparse <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_.
-  Similar to ``SuiteSparse`` but simpler and slower. CXSparse has
-  no dependencies on ``LAPACK`` and ``BLAS``. This makes for a simpler
-  build process and a smaller binary. **Optional**
+  A CMake native version of SuiteSparse that can be compiled on a variety of
+  platforms (e.g., using Visual Studio, Xcode, MinGW, etc.) is maintained by the
+  `CMake support for SuiteSparse <https://github.com/sergiud/SuiteSparse>`_
+  project.
 
 - `Apple's Accelerate sparse solvers <https://developer.apple.com/documentation/accelerate/sparse_solvers>`_.
   As of Xcode 9.0, Apple's Accelerate framework includes support for
@@ -104,9 +94,13 @@
   ``SuiteSparse``, and optionally used by Ceres directly for some
   operations.
 
-  On ``UNIX`` OSes other than macOS we recommend `ATLAS
+  For best performance on ``x86`` based Linux systems we recommend
+  using `Intel MKL
+  <https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html>`_.
+
+  Two other good options are `ATLAS
   <http://math-atlas.sourceforge.net/>`_, which includes ``BLAS`` and
-  ``LAPACK`` routines. It is also possible to use `OpenBLAS
+  ``LAPACK`` routines and `OpenBLAS
   <https://github.com/xianyi/OpenBLAS>`_ . However, one needs to be
   careful to `turn off the threading
   <https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-multi-threaded>`_
@@ -122,6 +116,15 @@
 
   **Optional but required for** ``SuiteSparse``.
 
+- `CUDA <https://developer.nvidia.com/cuda-toolkit>`_ If you have an
+  NVIDIA GPU then Ceres Solver can use it accelerate the solution of
+  the Gauss-Newton linear systems using the CMake flag ``USE_CUDA``.
+  Currently this support is limited to using the dense linear solvers that ship
+  with ``CUDA``. As a result GPU acceleration can be used to speed up
+  ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY`` and
+  ``DENSE_SCHUR``. This also enables ``CUDA`` mixed precision solves
+  for ``DENSE_NORMAL_CHOLESKY`` and ``DENSE_SCHUR``.  **Optional**.
+
 .. _section-linux:
 
 Linux
@@ -130,11 +133,12 @@
 We will use `Ubuntu <http://www.ubuntu.com>`_ as our example linux
 distribution.
 
-  .. NOTE ::
+.. NOTE::
 
-    These instructions are for Ubuntu 18.04 and newer. On Ubuntu 16.04
-    you need to manually get a more recent version of Eigen, such as
-    3.3.7.
+   Ceres Solver always supports the previous and current Ubuntu LTS
+   releases, currently 18.04 and 20.04, using the default Ubuntu
+   repositories and compiler toolchain. Support for earlier versions
+   is not guaranteed or maintained.
 
 Start by installing all the dependencies.
 
@@ -144,21 +148,21 @@
      sudo apt-get install cmake
      # google-glog + gflags
      sudo apt-get install libgoogle-glog-dev libgflags-dev
-     # BLAS & LAPACK
+     # Use ATLAS for BLAS & LAPACK
      sudo apt-get install libatlas-base-dev
      # Eigen3
      sudo apt-get install libeigen3-dev
-     # SuiteSparse and CXSparse (optional)
+     # SuiteSparse (optional)
      sudo apt-get install libsuitesparse-dev
 
 We are now ready to build, test, and install Ceres.
 
 .. code-block:: bash
 
- tar zxf ceres-solver-2.0.0.tar.gz
+ tar zxf ceres-solver-2.2.0.tar.gz
  mkdir ceres-bin
  cd ceres-bin
- cmake ../ceres-solver-2.0.0
+ cmake ../ceres-solver-2.2.0
  make -j3
  make test
  # Optionally install Ceres, it can also be exported using CMake which
@@ -172,7 +176,7 @@
 
 .. code-block:: bash
 
- bin/simple_bundle_adjuster ../ceres-solver-2.0.0/data/problem-16-22106-pre.txt
+ bin/simple_bundle_adjuster ../ceres-solver-2.2.0/data/problem-16-22106-pre.txt
 
 This runs Ceres for a maximum of 10 iterations using the
 ``DENSE_SCHUR`` linear solver. The output should look something like
@@ -181,63 +185,63 @@
 .. code-block:: bash
 
     iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
-       0  4.185660e+06    0.00e+00    1.09e+08   0.00e+00   0.00e+00  1.00e+04       0    7.59e-02    3.37e-01
-       1  1.062590e+05    4.08e+06    8.99e+06   5.36e+02   9.82e-01  3.00e+04       1    1.65e-01    5.03e-01
-       2  4.992817e+04    5.63e+04    8.32e+06   3.19e+02   6.52e-01  3.09e+04       1    1.45e-01    6.48e-01
-       3  1.899774e+04    3.09e+04    1.60e+06   1.24e+02   9.77e-01  9.26e+04       1    1.43e-01    7.92e-01
-       4  1.808729e+04    9.10e+02    3.97e+05   6.39e+01   9.51e-01  2.78e+05       1    1.45e-01    9.36e-01
-       5  1.803399e+04    5.33e+01    1.48e+04   1.23e+01   9.99e-01  8.33e+05       1    1.45e-01    1.08e+00
-       6  1.803390e+04    9.02e-02    6.35e+01   8.00e-01   1.00e+00  2.50e+06       1    1.50e-01    1.23e+00
+       0  4.185660e+06    0.00e+00    1.09e+08   0.00e+00   0.00e+00  1.00e+04        0    2.18e-02    6.57e-02
+       1  1.062590e+05    4.08e+06    8.99e+06   0.00e+00   9.82e-01  3.00e+04        1    5.07e-02    1.16e-01
+       2  4.992817e+04    5.63e+04    8.32e+06   3.19e+02   6.52e-01  3.09e+04        1    4.75e-02    1.64e-01
+       3  1.899774e+04    3.09e+04    1.60e+06   1.24e+02   9.77e-01  9.26e+04        1    4.74e-02    2.11e-01
+       4  1.808729e+04    9.10e+02    3.97e+05   6.39e+01   9.51e-01  2.78e+05        1    4.75e-02    2.59e-01
+       5  1.803399e+04    5.33e+01    1.48e+04   1.23e+01   9.99e-01  8.33e+05        1    4.74e-02    3.06e-01
+       6  1.803390e+04    9.02e-02    6.35e+01   8.00e-01   1.00e+00  2.50e+06        1    4.76e-02    3.54e-01
 
-    Ceres Solver v2.0.0 Solve Report
-    ----------------------------------
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
+
                                          Original                  Reduced
     Parameter blocks                        22122                    22122
     Parameters                              66462                    66462
     Residual blocks                         83718                    83718
-    Residual                               167436                   167436
+    Residuals                              167436                   167436
 
     Minimizer                        TRUST_REGION
 
     Dense linear algebra library            EIGEN
     Trust region strategy     LEVENBERG_MARQUARDT
-
                                             Given                     Used
     Linear solver                     DENSE_SCHUR              DENSE_SCHUR
     Threads                                     1                        1
-    Linear solver threads                       1                        1
-    Linear solver ordering              AUTOMATIC                22106, 16
+    Linear solver ordering              AUTOMATIC                 22106,16
+    Schur structure                         2,3,9                    2,3,9
 
     Cost:
     Initial                          4.185660e+06
     Final                            1.803390e+04
     Change                           4.167626e+06
 
-    Minimizer iterations                        6
-    Successful steps                            6
+    Minimizer iterations                        7
+    Successful steps                            7
     Unsuccessful steps                          0
 
     Time (in seconds):
-    Preprocessor                            0.261
+    Preprocessor                         0.043895
 
-      Residual evaluation                   0.082
-      Jacobian evaluation                   0.412
-      Linear solver                         0.442
-    Minimizer                               1.051
+      Residual only evaluation           0.029855 (7)
+      Jacobian & residual evaluation     0.120581 (7)
+      Linear solver                      0.153665 (7)
+    Minimizer                            0.339275
 
-    Postprocessor                           0.002
-    Total                                   1.357
+    Postprocessor                        0.000540
+    Total                                0.383710
 
-    Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.769766e-09 <= 1.000000e-06)
+    Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.769759e-09 <= 1.000000e-06)
+
 
 .. section-macos:
 
 macOS
 =====
 
-On macOS, you can either use `Homebrew
-<https://brew.sh/>`_ (recommended) or `MacPorts
-<https://www.macports.org/>`_ to install Ceres Solver.
+On macOS, you can either use `Homebrew <https://brew.sh/>`_
+(recommended) or `MacPorts <https://www.macports.org/>`_ to install
+Ceres Solver.
 
 If using `Homebrew <https://brew.sh/>`_, then
 
@@ -277,17 +281,17 @@
       brew install glog gflags
       # Eigen3
       brew install eigen
-      # SuiteSparse and CXSparse
+      # SuiteSparse
       brew install suite-sparse
 
 We are now ready to build, test, and install Ceres.
 
 .. code-block:: bash
 
-   tar zxf ceres-solver-2.0.0.tar.gz
+   tar zxf ceres-solver-2.2.0.tar.gz
    mkdir ceres-bin
    cd ceres-bin
-   cmake ../ceres-solver-2.0.0
+   cmake ../ceres-solver-2.2.0
    make -j3
    make test
    # Optionally install Ceres, it can also be exported using CMake which
@@ -295,53 +299,59 @@
    # documentation for the EXPORT_BUILD_DIR option for more information.
    make install
 
-Building with OpenMP on macOS
------------------------------
-
-Up to at least Xcode 12, OpenMP support was disabled in Apple's version of
-Clang.  However, you can install the latest version of the LLVM toolchain
-from Homebrew which does support OpenMP, and thus build Ceres with OpenMP
-support on macOS.  To do this, you must install llvm via Homebrew:
-
-.. code-block:: bash
-
-      # Install latest version of LLVM toolchain.
-      brew install llvm
-
-As the LLVM formula in Homebrew is keg-only, it will not be installed to
-``/usr/local`` to avoid conflicts with the standard Apple LLVM toolchain.
-To build Ceres with the Homebrew LLVM toolchain you should do the
-following:
-
-.. code-block:: bash
-
-   tar zxf ceres-solver-2.0.0.tar.gz
-   mkdir ceres-bin
-   cd ceres-bin
-   # Configure the local shell only (not persistent) to use the Homebrew LLVM
-   # toolchain in favour of the default Apple version.  This is taken
-   # verbatim from the instructions output by Homebrew when installing the
-   # llvm formula.
-   export LDFLAGS="-L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib"
-   export CPPFLAGS="-I/usr/local/opt/llvm/include"
-   export PATH="/usr/local/opt/llvm/bin:$PATH"
-   # Force CMake to use the Homebrew version of Clang and enable OpenMP.
-   cmake -DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++ -DCERES_THREADING_MODEL=OPENMP ../ceres-solver-2.0.0
-   make -j3
-   make test
-   # Optionally install Ceres.  It can also be exported using CMake which
-   # allows Ceres to be used without requiring installation.  See the
-   # documentation for the EXPORT_BUILD_DIR option for more information.
-   make install
-
-Like the Linux build, you should now be able to run
-``bin/simple_bundle_adjuster``.
-
 .. _section-windows:
 
 Windows
 =======
 
+Using a Library Manager
+-----------------------
+
+`vcpkg <https://github.com/microsoft/vcpkg>`_ is a library manager for Microsoft
+Windows that can be used to install Ceres Solver and all its dependencies.
+
+#. Install the library manager into a top-level directory ``vcpkg/`` on Windows
+   following the `guide
+   <https://github.com/microsoft/vcpkg#quick-start-windows>`_, e.g., using
+   Visual Studio 2022 community edition, or simply run
+
+    .. code:: bat
+
+        git clone https://github.com/Microsoft/vcpkg.git
+        cd vcpkg
+        .\bootstrap-vcpkg.bat
+        .\vcpkg integrate install
+
+#. Use vcpkg to install and build Ceres and all its dependencies, e.g., for 64
+   bit Windows
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe install ceres:x64-windows
+
+   Or with optional components, e.g., SuiteSparse, using
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe install ceres[suitesparse]:x64-windows
+
+#. Integrate vcpkg packages with Visual Studio to allow it to automatically
+   find all the libraries installed by vcpkg.
+
+   .. code:: bat
+
+      vcpkg\vcpkg.exe integrate install
+
+#. To use Ceres in a CMake project, follow our :ref:`instructions
+   <section-using-ceres>`.
+
+
+Building from Source
+--------------------
+
+Ceres Solver can also be built from source. For this purpose, we support Visual
+Studio 2019 and newer.
+
 .. NOTE::
 
   If you find the following CMake difficult to set up, then you may
@@ -349,36 +359,11 @@
   <https://github.com/tbennun/ceres-windows>`_ for Ceres Solver by Tal
   Ben-Nun.
 
-On Windows, we support building with Visual Studio 2015.2 of newer. Note
-that the Windows port is less featureful and less tested than the
-Linux or macOS versions due to the lack of an officially supported
-way of building SuiteSparse and CXSparse.  There are however a number
-of unofficial ways of building these libraries. Building on Windows
-also a bit more involved since there is no automated way to install
-dependencies.
+#. Create a top-level directory for dependencies, build, and sources somewhere,
+   e.g., ``ceres/``
 
-.. NOTE:: Using ``google-glog`` & ``miniglog`` with windows.h.
-
- The windows.h header if used with GDI (Graphics Device Interface)
- defines ``ERROR``, which conflicts with the definition of ``ERROR``
- as a LogSeverity level in ``google-glog`` and ``miniglog``.  There
- are at least two possible fixes to this problem:
-
- #. Use ``google-glog`` and define ``GLOG_NO_ABBREVIATED_SEVERITIES``
-    when building Ceres and your own project, as documented `here
-    <http://google-glog.googlecode.com/svn/trunk/doc/glog.html>`__.
-    Note that this fix will not work for ``miniglog``, but use of
-    ``miniglog`` is strongly discouraged on any platform for which
-    ``google-glog`` is available (which includes Windows).
- #. If you do not require GDI, then define ``NOGDI`` **before**
-    including windows.h.  This solution should work for both
-    ``google-glog`` and ``miniglog`` and is documented for
-    ``google-glog`` `here
-    <https://code.google.com/p/google-glog/issues/detail?id=33>`__.
-
-#. Make a toplevel directory for deps & build & src somewhere: ``ceres/``
 #. Get dependencies; unpack them as subdirectories in ``ceres/``
-   (``ceres/eigen``, ``ceres/glog``, etc)
+   (``ceres/eigen``, ``ceres/glog``, etc.)
 
    #. ``Eigen`` 3.3 . Configure and optionally install Eigen. It should be
       exported into the CMake package registry by default as part of the
@@ -394,45 +379,55 @@
       project.  If you wish to use ``SuiteSparse``, follow their
       instructions for obtaining and building it.
 
-   #. (Experimental) ``CXSparse`` Previously CXSparse was not
-      available on Windows, there are now several ports that enable it
-      to be, including: `[1] <https://github.com/PetterS/CXSparse>`_
-      and `[2] <https://github.com/TheFrenchLeaf/CXSparse>`_.  If you
-      wish to use ``CXSparse``, follow their instructions for
-      obtaining and building it.
+      Alternatively, Ceres Solver supports ``SuiteSparse`` binary
+      packages available for Visual Studio 2019 and 2022 provided by
+      the `CMake support for SuiteSparse
+      <https://github.com/sergiud/SuiteSparse>`_ project that also
+      include `reference LAPACK <http://www.netlib.org/blas>`_ (and
+      BLAS). The binary packages are used by Ceres Solver for
+      continuous testing on Github.
 
 #. Unpack the Ceres tarball into ``ceres``. For the tarball, you
    should get a directory inside ``ceres`` similar to
-   ``ceres-solver-2.0.0``. Alternately, checkout Ceres via ``git`` to
+   ``ceres-solver-2.2.0``. Alternately, checkout Ceres via ``git`` to
    get ``ceres-solver.git`` inside ``ceres``.
 
 #. Install ``CMake``,
 
-#. Make a dir ``ceres/ceres-bin`` (for an out-of-tree build)
+#. Create a directory ``ceres/ceres-bin`` (for an out-of-tree build)
+
+   #. If you use the above binary ``SuiteSparse`` package, make sure CMake can
+      find it, e.g., by assigning the path of the directory that contains the
+      unzipped contents to the ``CMAKE_PREFIX_PATH`` environment variable. In a
+      Windows command prompt this can be achieved as follows:
+
+      .. code:: bat
+
+        export CMAKE_PREFIX_PATH=C:/Downloads/SuiteSparse-5.11.0-cmake.1-vc16-Win64-Release-shared-gpl
 
 #. Run ``CMake``; select the ``ceres-solver-X.Y.Z`` or
    ``ceres-solver.git`` directory for the CMake file. Then select the
-   ``ceres-bin`` for the build dir.
+   ``ceres-bin`` for the build directory.
 
-#. Try running ``Configure``. It won't work. It'll show a bunch of options.
-   You'll need to set:
+#. Try running ``Configure`` which can fail at first because some dependencies
+   cannot be automatically located. In this case, you must set the following
+   CMake variables to the appropriate directories where you unpacked/built them:
 
    #. ``Eigen3_DIR`` (Set to directory containing ``Eigen3Config.cmake``)
    #. ``GLOG_INCLUDE_DIR_HINTS``
    #. ``GLOG_LIBRARY_DIR_HINTS``
    #. (Optional) ``gflags_DIR`` (Set to directory containing ``gflags-config.cmake``)
-   #. (Optional) ``SUITESPARSE_INCLUDE_DIR_HINTS``
-   #. (Optional) ``SUITESPARSE_LIBRARY_DIR_HINTS``
-   #. (Optional) ``CXSPARSE_INCLUDE_DIR_HINTS``
-   #. (Optional) ``CXSPARSE_LIBRARY_DIR_HINTS``
+   #. (SuiteSparse binary package) ``BLAS_blas_LIBRARY`` and
+      ``LAPACK_lapack_LIBRARY`` CMake variables must be `explicitly set` to
+      ``<path>/lib/blas.lib`` and ``<path>/lib/lapack.lib``, respectively, both
+      located in the unzipped package directory ``<path>``.
 
-   to the appropriate directories where you unpacked/built them. If
-   any of the variables are not visible in the ``CMake`` GUI, create a
-   new entry for them.  We recommend using the
-   ``<NAME>_(INCLUDE/LIBRARY)_DIR_HINTS`` variables rather than
-   setting the ``<NAME>_INCLUDE_DIR`` & ``<NAME>_LIBRARY`` variables
-   directly to keep all of the validity checking, and to avoid having
-   to specify the library files manually.
+   If any of the variables are not visible in the ``CMake`` GUI, create a new
+   entry for them.  We recommend using the
+   ``<NAME>_(INCLUDE/LIBRARY)_DIR_HINTS`` variables rather than setting the
+   ``<NAME>_INCLUDE_DIR`` & ``<NAME>_LIBRARY`` variables directly to keep all of
+   the validity checking, and to avoid having to specify the library files
+   manually.
 
 #. You may have to tweak some more settings to generate a MSVC
    project.  After each adjustment, try pressing Configure & Generate
@@ -447,17 +442,15 @@
 Like the Linux build, you should now be able to run
 ``bin/simple_bundle_adjuster``.
 
-Notes:
+.. note::
 
-#. The default build is Debug; consider switching it to release mode.
-#. Currently ``system_test`` is not working properly.
-#. CMake puts the resulting test binaries in ``ceres-bin/examples/Debug``
-   by default.
-#. The solvers supported on Windows are ``DENSE_QR``, ``DENSE_SCHUR``,
-   ``CGNR``, and ``ITERATIVE_SCHUR``.
-#. We're looking for someone to work with upstream ``SuiteSparse`` to
-   port their build system to something sane like ``CMake``, and get a
-   fully supported Windows port.
+    #. The default build is ``Debug``; consider switching it to ``Release`` for
+       optimal performance.
+    #. CMake puts the resulting test binaries in ``ceres-bin/examples/Debug`` by
+       default.
+    #. Without a sparse linear algebra library, only a subset of
+       solvers is usable, namely: ``DENSE_QR``, ``DENSE_SCHUR``,
+       ``CGNR``, and ``ITERATIVE_SCHUR``.
 
 
 .. _section-android:
@@ -556,9 +549,7 @@
 The default CMake configuration builds a bare bones version of Ceres
 Solver that only depends on Eigen (``MINIGLOG`` is compiled into Ceres
 if it is used), this should be sufficient for solving small to
-moderate sized problems (No ``SPARSE_SCHUR``,
-``SPARSE_NORMAL_CHOLESKY`` linear solvers and no ``CLUSTER_JACOBI``
-and ``CLUSTER_TRIDIAGONAL`` preconditioners).
+moderate sized problems.
 
 If you decide to use ``LAPACK`` and ``BLAS``, then you also need to
 add ``Accelerate.framework`` to your Xcode project's linking
@@ -648,22 +639,14 @@
       terms.  Ceres requires some components that are only licensed under
       GPL/Commercial terms.
 
-#. ``CXSPARSE [Default: ON]``: By default, Ceres will link to
-   ``CXSparse`` if all its dependencies are present. Turn this ``OFF``
-   to build Ceres without ``CXSparse``.
-
-   .. NOTE::
-
-      CXSparse is licensed under the LGPL.
-
 #. ``ACCELERATESPARSE [Default: ON]``: By default, Ceres will link to
    Apple's Accelerate framework directly if a version of it is detected
    which supports solving sparse linear systems.  Note that on Apple OSs
    Accelerate usually also provides the BLAS/LAPACK implementations and
    so would be linked against irrespective of the value of ``ACCELERATESPARSE``.
 
-#. ``EIGENSPARSE [Default: ON]``: By default, Ceres will not use
-   Eigen's sparse Cholesky factorization.
+#. ``EIGENSPARSE [Default: ON]``: By default, Ceres will use Eigen's
+   sparse Cholesky factorization.
 
 #. ``GFLAGS [Default: ON]``: Turn this ``OFF`` to build Ceres without
    ``gflags``. This will also prevent some of the example code from
@@ -680,11 +663,6 @@
    gains in the ``SPARSE_SCHUR`` solver, you can disable some of the
    template specializations by turning this ``OFF``.
 
-#. ``CERES_THREADING_MODEL [Default: CXX_THREADS > OPENMP > NO_THREADS]``:
-   Multi-threading backend Ceres should be compiled with.  This will
-   automatically be set to only accept the available subset of threading
-   options in the CMake GUI.
-
 #. ``BUILD_SHARED_LIBS [Default: OFF]``: By default Ceres is built as
    a static library, turn this ``ON`` to instead build Ceres as a
    shared library.
@@ -701,8 +679,8 @@
 
 #. ``BUILD_DOCUMENTATION [Default: OFF]``: Use this to enable building
    the documentation, requires `Sphinx <http://sphinx-doc.org/>`_ and
-   the `sphinx-better-theme
-   <https://pypi.python.org/pypi/sphinx-better-theme>`_ package
+   the `sphinx-rtd-theme
+   <https://pypi.org/project/sphinx-rtd-theme/>`_ package
    available from the Python package index. In addition, ``make
    ceres_docs`` can be used to build only the documentation.
 
@@ -865,25 +843,18 @@
 
 #. ``SuiteSparse``: Ceres built with SuiteSparse (``SUITESPARSE=ON``).
 
-#. ``CXSparse``: Ceres built with CXSparse (``CXSPARSE=ON``).
-
 #. ``AccelerateSparse``: Ceres built with Apple's Accelerate sparse solvers (``ACCELERATESPARSE=ON``).
 
 #. ``EigenSparse``: Ceres built with Eigen's sparse Cholesky factorization
    (``EIGENSPARSE=ON``).
 
-#. ``SparseLinearAlgebraLibrary``: Ceres built with *at least one* sparse linear
-   algebra library.  This is equivalent to ``SuiteSparse`` **OR** ``CXSparse``
-   **OR** ``AccelerateSparse``  **OR** ``EigenSparse``.
+#. ``SparseLinearAlgebraLibrary``: Ceres built with *at least one*
+   sparse linear algebra library.  This is equivalent to
+   ``SuiteSparse`` **OR** ``AccelerateSparse`` **OR** ``EigenSparse``.
 
 #. ``SchurSpecializations``: Ceres built with Schur specializations
    (``SCHUR_SPECIALIZATIONS=ON``).
 
-#. ``OpenMP``: Ceres built with OpenMP (``CERES_THREADING_MODEL=OPENMP``).
-
-#. ``Multithreading``: Ceres built with *a* multithreading library.
-   This is equivalent to (``CERES_THREAD != NO_THREADS``).
-
 To specify one/multiple Ceres components use the ``COMPONENTS`` argument to
 `find_package()
 <http://www.cmake.org/cmake/help/v3.5/command/find_package.html>`_ like so:
diff --git a/docs/source/interfacing_with_autodiff.rst b/docs/source/interfacing_with_autodiff.rst
index 02f58b2..fa05835 100644
--- a/docs/source/interfacing_with_autodiff.rst
+++ b/docs/source/interfacing_with_autodiff.rst
@@ -37,7 +37,7 @@
    template <typename T> T TemplatedComputeDistortion(const T r2) {
      const double k1 = 0.0082;
      const double k2 = 0.000023;
-     return 1.0 + k1 * y2 + k2 * r2 * r2;
+     return 1.0 + k1 * r2 + k2 * r2 * r2;
    }
 
    struct Affine2DWithDistortion {
@@ -118,12 +118,13 @@
        y[0] = y_in[0];
        y[1] = y_in[1];
 
-       compute_distortion.reset(new ceres::CostFunctionToFunctor<1, 1>(
-            new ceres::NumericDiffCostFunction<ComputeDistortionValueFunctor,
-                                               ceres::CENTRAL,
-                                               1,
-                                               1>(
-               new ComputeDistortionValueFunctor)));
+       compute_distortion = std::make_unique<ceres::CostFunctionToFunctor<1, 1>>(
+         std::make_unique<ceres::NumericDiffCostFunction<
+               ComputeDistortionValueFunctor
+             , ceres::CENTRAL, 1, 1
+           >
+         >()
+       );
      }
 
      template <typename T>
@@ -140,7 +141,7 @@
 
      double x[2];
      double y[2];
-     std::unique_ptr<ceres::CostFunctionToFunctor<1, 1> > compute_distortion;
+     std::unique_ptr<ceres::CostFunctionToFunctor<1, 1>> compute_distortion;
    };
 
 
@@ -148,7 +149,7 @@
 ------------------------------------------------
 
 Now suppose we are given a function :code:`ComputeDistortionValue`
-thatis able to compute its value and optionally its Jacobian on demand
+that is able to compute its value and optionally its Jacobian on demand
 and has the following signature:
 
 .. code-block:: c++
diff --git a/docs/source/inverse_and_implicit_function_theorems.rst b/docs/source/inverse_and_implicit_function_theorems.rst
new file mode 100644
index 0000000..7d8f7fa
--- /dev/null
+++ b/docs/source/inverse_and_implicit_function_theorems.rst
@@ -0,0 +1,214 @@
+.. default-domain:: cpp
+
+.. cpp:namespace:: ceres
+
+.. _chapter-inverse_function_theorem:
+
+==========================================
+Using Inverse & Implicit Function Theorems
+==========================================
+
+Until now we have considered methods for computing derivatives that
+work directly on the function being differentiated. However, this is
+not always possible. For example, if the function can only be computed
+via an iterative algorithm, or there is no explicit definition of the
+function available.  In this section we will see how we can use two
+basic results from calculus to get around these difficulties.
+
+
+Inverse Function Theorem
+========================
+
+Suppose we wish to evaluate the derivative of a function :math:`f(x)`,
+but evaluating :math:`f(x)` is not easy. Say it involves running an
+iterative algorithm. You could try automatically differentiating the
+iterative algorithm, but even if that is possible, it can become quite
+expensive.
+
+In some cases we get lucky, and computing the inverse of :math:`f(x)`
+is an easy operation. In these cases, we can use the `Inverse Function
+Theorem <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to
+compute the derivative exactly. Here is the key idea:
+
+Assuming that :math:`y=f(x)` is continuously differentiable in a
+neighborhood of a point :math:`x` and :math:`Df(x)` is the invertible
+Jacobian of :math:`f` at :math:`x`, then by applying the chain rule to
+the identity :math:`f^{-1}(f(x)) = x`, we have
+:math:`Df^{-1}(f(x))Df(x) = I`, or :math:`Df^{-1}(y) = (Df(x))^{-1}`,
+i.e., the Jacobian of :math:`f^{-1}` is the inverse of the Jacobian of
+:math:`f`, or :math:`Df(x) = (Df^{-1}(y))^{-1}`.
+
+For example, let :math:`f(x) = e^x`. Now of course we know that
+:math:`Df(x) = e^x`, but let's try and compute it via the Inverse
+Function Theorem. For :math:`x > 0`, we have :math:`f^{-1}(y) = \log
+y`, so :math:`Df^{-1}(y) = \frac{1}{y}`, so :math:`Df(x) =
+(Df^{-1}(y))^{-1} = y = e^x`.
+
+You maybe wondering why the above is true. A smoothly differentiable
+function in a small neighborhood is well approximated by a linear
+function. Indeed this is a good way to think about the Jacobian, it is
+the matrix that best approximates the function linearly. Once you do
+that, it is straightforward to see that *locally* :math:`f^{-1}(y)` is
+best approximated linearly by the inverse of the Jacobian of
+:math:`f(x)`.
+
+Let us now consider a more practical example.
+
+Geodetic Coordinate System Conversion
+-------------------------------------
+
+When working with data related to the Earth, one can use two different
+coordinate systems. The familiar (latitude, longitude, height)
+Latitude-Longitude-Altitude coordinate system or the `ECEF
+<http://en.wikipedia.org/wiki/ECEF>`_ coordinate systems. The former
+is familiar but is not terribly convenient analytically. The latter is
+a Cartesian system but not particularly intuitive. So systems that
+process earth related data have to go back and forth between these
+coordinate systems.
+
+The conversion between the LLA and the ECEF coordinate system requires
+a model of the Earth, the most commonly used one being `WGS84
+<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
+
+Going from the spherical :math:`(\phi,\lambda,h)` to the ECEF
+:math:`(x,y,z)` coordinates is easy.
+
+.. math::
+
+   \chi &= \sqrt{1 - e^2 \sin^2 \phi}
+
+   X &= \left( \frac{a}{\chi} + h \right) \cos \phi \cos \lambda
+
+   Y &= \left( \frac{a}{\chi} + h \right) \cos \phi \sin \lambda
+
+   Z &= \left(\frac{a(1-e^2)}{\chi}  +h \right) \sin \phi
+
+Here :math:`a` and :math:`e^2` are constants defined by `WGS84
+<https://en.wikipedia.org/wiki/World_Geodetic_System#1984_version>`_.
+
+Going from ECEF to LLA coordinates requires an iterative algorithm. So
+to compute the derivative of the this transformation we invoke the
+Inverse Function Theorem as follows:
+
+.. code-block:: c++
+
+   Eigen::Vector3d ecef; // Fill some values
+   // Iterative computation.
+   Eigen::Vector3d lla = ECEFToLLA(ecef);
+   // Analytic derivatives
+   Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
+   bool invertible;
+   Eigen::Matrix3d ecef_to_lla_jacobian;
+   lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
+
+
+Implicit Function Theorem
+=========================
+
+Consider now the problem where we have two variables :math:`x \in
+\mathbb{R}^m` and :math:`y \in \mathbb{R}^n` and a function
+:math:`F:\mathbb{R}^m \times \mathbb{R}^n \rightarrow \mathbb{R}^n`
+such that :math:`F(x,y) = 0` and we wish to calculate the Jacobian of
+:math:`y` with respect to `x`. How do we do this?
+
+If for a given value of :math:`(x,y)`, the partial Jacobian
+:math:`D_2F(x,y)` is full rank, then the `Implicit Function Theorem
+<https://en.wikipedia.org/wiki/Implicit_function_theorem>`_ tells us
+that there exists a neighborhood of :math:`x` and a function :math:`G`
+such :math:`y = G(x)` in this neighborhood. Differentiating
+:math:`F(x,G(x)) = 0` gives us
+
+.. math::
+
+   D_1F(x,y) + D_2F(x,y)DG(x) &= 0
+
+                        DG(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
+
+                        D y(x) &= -(D_2F(x,y))^{-1} D_1 F(x,y)
+
+This means that we can compute the derivative of :math:`y` with
+respect to :math:`x` by multiplying the Jacobian of :math:`F` w.r.t
+:math:`x` by the inverse of the Jacobian of :math:`F` w.r.t :math:`y`.
+
+Let's consider two examples.
+
+Roots of a Polynomial
+---------------------
+
+The first example we consider is a classic. Let :math:`p(x) = a_0 +
+a_1 x + \dots + a_n x^n` be a degree :math:`n` polynomial, and we wish
+to compute the derivative of its roots with respect to its
+coefficients. There is no closed form formula for computing the roots
+of a general degree :math:`n` polynomial. `Galois
+<https://en.wikipedia.org/wiki/%C3%89variste_Galois>`_ and `Abel
+<https://en.wikipedia.org/wiki/Niels_Henrik_Abel>`_ proved that. There
+are numerical algorithms like computing the eigenvalues of the
+`Companion Matrix
+<https://nhigham.com/2021/03/23/what-is-a-companion-matrix/>`_, but
+differentiating an eigenvalue solver does not seem like fun. But the
+Implicit Function Theorem offers us a simple path.
+
+If :math:`x` is a root of :math:`p(x)`, then :math:`F(\mathbf{a}, x) =
+a_0 + a_1 x + \dots + a_n x^n = 0`. So,
+
+.. math::
+
+   D_1 F(\mathbf{a}, x) &= [1, x, x^2, \dots, x^n]
+
+   D_2 F(\mathbf{a}, x) &= \sum_{k=1}^n k a_k x^{k-1} = Dp(x)
+
+        Dx(a) &= \frac{-1}{Dp(x)} [1, x, x^2, \dots, x^n]
+
+Differentiating the Solution to an Optimization Problem
+-------------------------------------------------------
+
+Sometimes we are required to solve optimization problems inside
+optimization problems, and this requires computing the derivative of
+the optimal solution (or a fixed point) of an optimization problem
+w.r.t its parameters.
+
+Let :math:`\theta \in \mathbb{R}^m` be a vector, :math:`A(\theta) \in
+\mathbb{R}^{k\times n}` be a matrix whose entries are a function of
+:math:`\theta` with :math:`k \ge n` and let :math:`b \in \mathbb{R}^k`
+be a constant vector, then consider the linear least squares problem:
+
+.. math::
+
+   x^* = \arg \min_x \|A(\theta) x - b\|_2^2
+
+How do we compute :math:`D_\theta x^*(\theta)`?
+
+One approach would be to observe that :math:`x^*(\theta) =
+(A^\top(\theta)A(\theta))^{-1}A^\top(\theta)b` and then differentiate
+this w.r.t :math:`\theta`. But this would require differentiating
+through the inverse of the matrix
+:math:`(A^\top(\theta)A(\theta))^{-1}`. Not exactly easy. Let's use
+the Implicit Function Theorem instead.
+
+The first step is to observe that :math:`x^*` satisfies the so called
+*normal equations*.
+
+.. math::
+
+   A^\top(\theta)A(\theta)x^* - A^\top(\theta)b = 0
+
+We will compute :math:`D_\theta x^*` column-wise, treating
+:math:`A(\theta)` as a function of one coordinate (:math:`\theta_i`)
+of :math:`\theta` at a time. So using the normal equations, let's
+define :math:`F(\theta_i, x^*) = A^\top(\theta_i)A(\theta_i)x^* -
+A^\top(\theta_i)b = 0`. Using which can now compute:
+
+.. math::
+
+   D_1F(\theta_i, x^*) &= D_{\theta_i}A^\top A + A^\top
+   D_{\theta_i}Ax^* - D_{\theta_i} A^\top b = g_i
+
+   D_2F(\theta_i, x^*) &= A^\top A
+
+   Dx^*(\theta_i) & = -(A^\top A)^{-1} g_i
+
+   Dx^*(\theta) & = -(A^\top A )^{-1} \left[g_1, \dots, g_m\right]
+
+Observe that we only need to compute the inverse of :math:`A^\top A`,
+to compute :math:`D x^*(\theta)`, which we needed anyways to compute
+:math:`x^*`.
diff --git a/docs/source/license.rst b/docs/source/license.rst
index a3c55c9..ed85f6a 100644
--- a/docs/source/license.rst
+++ b/docs/source/license.rst
@@ -12,7 +12,7 @@
 
 Ceres Solver is licensed under the New BSD license, whose terms are as follows.
 
-Copyright 2016 Google Inc. All rights reserved.
+Copyright 2023 Google Inc. All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
diff --git a/docs/source/loss.png b/docs/source/loss.png
index 9f98d00..5c9ac07 100644
--- a/docs/source/loss.png
+++ b/docs/source/loss.png
Binary files differ
diff --git a/docs/source/modeling_faqs.rst b/docs/source/modeling_faqs.rst
index a0c8f2f..0d23de4 100644
--- a/docs/source/modeling_faqs.rst
+++ b/docs/source/modeling_faqs.rst
@@ -37,7 +37,7 @@
    automatic and numeric differentiation. See
    :class:`CostFunctionToFunctor`.
 
-#. When using Quaternions,  consider using :class:`QuaternionParameterization`.
+#. When using Quaternions,  consider using :class:`QuaternionManifold`.
 
    `Quaternions <https://en.wikipedia.org/wiki/Quaternion>`_ are a
    four dimensional parameterization of the space of three dimensional
@@ -47,14 +47,14 @@
    associate a local parameterization with parameter blocks
    representing a Quaternion. Assuming that the order of entries in
    your parameter block is :math:`w,x,y,z`, you can use
-   :class:`QuaternionParameterization`.
+   :class:`QuaternionManifold`.
 
    .. NOTE::
 
      If you are using `Eigen's Quaternion
      <http://eigen.tuxfamily.org/dox/classEigen_1_1Quaternion.html>`_
      object, whose layout is :math:`x,y,z,w`, then you should use
-     :class:`EigenQuaternionParameterization`.
+     :class:`EigenQuaternionManifold`.
 
 
 #. How do I solve problems with general linear & non-linear
@@ -85,50 +85,4 @@
 
 #. How do I set one or more components of a parameter block constant?
 
-   Using :class:`SubsetParameterization`.
-
-#. Putting `Inverse Function Theorem
-   <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ to use.
-
-   Every now and then we have to deal with functions which cannot be
-   evaluated analytically. Computing the Jacobian in such cases is
-   tricky. A particularly interesting case is where the inverse of the
-   function is easy to compute analytically. An example of such a
-   function is the Coordinate transformation between the `ECEF
-   <http://en.wikipedia.org/wiki/ECEF>`_ and the `WGS84
-   <http://en.wikipedia.org/wiki/World_Geodetic_System>`_ where the
-   conversion from WGS84 to ECEF is analytic, but the conversion
-   back to WGS84 uses an iterative algorithm. So how do you compute the
-   derivative of the ECEF to WGS84 transformation?
-
-   One obvious approach would be to numerically
-   differentiate the conversion function. This is not a good idea. For
-   one, it will be slow, but it will also be numerically quite
-   bad.
-
-   Turns out you can use the `Inverse Function Theorem
-   <http://en.wikipedia.org/wiki/Inverse_function_theorem>`_ in this
-   case to compute the derivatives more or less analytically.
-
-   The key result here is. If :math:`x = f^{-1}(y)`, and :math:`Df(x)`
-   is the invertible Jacobian of :math:`f` at :math:`x`. Then the
-   Jacobian :math:`Df^{-1}(y) = [Df(x)]^{-1}`, i.e., the Jacobian of
-   the :math:`f^{-1}` is the inverse of the Jacobian of :math:`f`.
-
-   Algorithmically this means that given :math:`y`, compute :math:`x =
-   f^{-1}(y)` by whatever means you can. Evaluate the Jacobian of
-   :math:`f` at :math:`x`. If the Jacobian matrix is invertible, then
-   its inverse is the Jacobian of :math:`f^{-1}(y)` at  :math:`y`.
-
-   One can put this into practice with the following code fragment.
-
-   .. code-block:: c++
-
-      Eigen::Vector3d ecef; // Fill some values
-      // Iterative computation.
-      Eigen::Vector3d lla = ECEFToLLA(ecef);
-      // Analytic derivatives
-      Eigen::Matrix3d lla_to_ecef_jacobian = LLAToECEFJacobian(lla);
-      bool invertible;
-      Eigen::Matrix3d ecef_to_lla_jacobian;
-      lla_to_ecef_jacobian.computeInverseWithCheck(ecef_to_lla_jacobian, invertible);
+   Using :class:`SubsetManifold`.
diff --git a/docs/source/nnls_covariance.rst b/docs/source/nnls_covariance.rst
index 66afd44..f95d246 100644
--- a/docs/source/nnls_covariance.rst
+++ b/docs/source/nnls_covariance.rst
@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -115,7 +116,7 @@
      four dimensional quaternion used to parameterize :math:`SO(3)`,
      which is a three dimensional manifold. In cases like this, the
      user should use an appropriate
-     :class:`LocalParameterization`. Not only will this lead to better
+     :class:`Manifold`. Not only will this lead to better
      numerical behaviour of the Solver, it will also expose the rank
      deficiency to the :class:`Covariance` object so that it can
      handle it correctly.
@@ -166,7 +167,7 @@
       moderately fast algorithm suitable for small to medium sized
       matrices. For best performance we recommend using
       ``SuiteSparseQR`` which is enabled by setting
-      :member:`Covaraince::Options::sparse_linear_algebra_library_type`
+      :member:`Covariance::Options::sparse_linear_algebra_library_type`
       to ``SUITE_SPARSE``.
 
       ``SPARSE_QR`` cannot compute the covariance if the
@@ -187,6 +188,23 @@
       well as rank deficient Jacobians.
 
 
+.. member:: double Covariance::Options::column_pivot_threshold
+
+   Default: :math:`-1`
+
+    During QR factorization, if a column with Euclidean norm less than
+    ``column_pivot_threshold`` is encountered it is treated as zero.
+
+    If ``column_pivot_threshold < 0``, then an automatic default value
+    of `20*(m+n)*eps*sqrt(max(diag(J’*J)))` is used.  Here `m` and `n`
+    are the number of rows and columns of the Jacobian (`J`)
+    respectively.
+
+    This is an advanced option meant for users who know enough about
+    their Jacobian matrices that they can determine a value better
+    than the default.
+
+
 .. member:: int Covariance::Options::min_reciprocal_condition_number
 
    Default: :math:`10^{-14}`
@@ -221,7 +239,7 @@
       .. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}}  < \sqrt{\text{min_reciprocal_condition_number}}
 
       where :math:`\sigma_{\text{min}}` and
-      :math:`\sigma_{\text{max}}` are the minimum and maxiumum
+      :math:`\sigma_{\text{max}}` are the minimum and maximum
       singular values of :math:`J` respectively.
 
    2. ``SPARSE_QR``
@@ -285,7 +303,7 @@
    entire documentation for :class:`Covariance::Options` before using
    :class:`Covariance`.
 
-.. function:: bool Covariance::Compute(const vector<pair<const double*, const double*> >& covariance_blocks, Problem* problem)
+.. function:: bool Covariance::Compute(const std::vector<std::pair<const double*, const double*> >& covariance_blocks, Problem* problem)
 
    Compute a part of the covariance matrix.
 
@@ -361,7 +379,7 @@
  Covariance::Options options;
  Covariance covariance(options);
 
- vector<pair<const double*, const double*> > covariance_blocks;
+ std::vector<std::pair<const double*, const double*> > covariance_blocks;
  covariance_blocks.push_back(make_pair(x, x));
  covariance_blocks.push_back(make_pair(y, y));
  covariance_blocks.push_back(make_pair(x, y));
diff --git a/docs/source/nnls_modeling.rst b/docs/source/nnls_modeling.rst
index c0c3227..be87149 100644
--- a/docs/source/nnls_modeling.rst
+++ b/docs/source/nnls_modeling.rst
@@ -1,3 +1,5 @@
+.. highlight:: c++
+
 .. default-domain:: cpp
 
 .. cpp:namespace:: ceres
@@ -50,7 +52,7 @@
 
 As a special case, when :math:`\rho_i(x) = x`, i.e., the identity
 function, and :math:`l_j = -\infty` and :math:`u_j = \infty` we get
-the more familiar unconstrained `non-linear least squares problem
+the usual unconstrained `non-linear least squares problem
 <http://en.wikipedia.org/wiki/Non-linear_least_squares>`_.
 
 .. math:: :label: ceresproblemunconstrained
@@ -80,12 +82,12 @@
      public:
       virtual bool Evaluate(double const* const* parameters,
                             double* residuals,
-                            double** jacobians) = 0;
-      const vector<int32>& parameter_block_sizes();
+                            double** jacobians) const = 0;
+      const std::vector<int32>& parameter_block_sizes();
       int num_residuals() const;
 
      protected:
-      vector<int32>* mutable_parameter_block_sizes();
+      std::vector<int32>* mutable_parameter_block_sizes();
       void set_num_residuals(int num_residuals);
     };
 
@@ -98,7 +100,7 @@
 the corresponding accessors. This information will be verified by the
 :class:`Problem` when added with :func:`Problem::AddResidualBlock`.
 
-.. function:: bool CostFunction::Evaluate(double const* const* parameters, double* residuals, double** jacobians)
+.. function:: bool CostFunction::Evaluate(double const* const* parameters, double* residuals, double** jacobians) const
 
    Compute the residual vector and the Jacobian matrices.
 
@@ -179,12 +181,19 @@
      class AutoDiffCostFunction : public
      SizedCostFunction<kNumResiduals, Ns> {
       public:
-       AutoDiffCostFunction(CostFunctor* functor, ownership = TAKE_OWNERSHIP);
+       // Instantiate CostFunctor using the supplied arguments.
+       template<class ...Args>
+       explicit AutoDiffCostFunction(Args&& ...args);
+       explicit AutoDiffCostFunction(std::unique_ptr<CostFunctor> functor);
+       explicit AutoDiffCostFunction(CostFunctor* functor, ownership = TAKE_OWNERSHIP);
+
        // Ignore the template parameter kNumResiduals and use
        // num_residuals instead.
        AutoDiffCostFunction(CostFunctor* functor,
                             int num_residuals,
                             ownership = TAKE_OWNERSHIP);
+       AutoDiffCostFunction(std::unique_ptr<CostFunctor> functor,
+                            int num_residuals);
      };
 
    To get an auto differentiated cost function, you must define a
@@ -242,9 +251,9 @@
 
    .. code-block:: c++
 
-    CostFunction* cost_function
-        = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(
-            new MyScalarCostFunctor(1.0));              ^  ^  ^
+    auto* cost_function
+        = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(1.0);
+                                                        ^  ^  ^
                                                         |  |  |
                             Dimension of residual ------+  |  |
                             Dimension of x ----------------+  |
@@ -270,7 +279,7 @@
    .. code-block:: c++
 
     MyScalarCostFunctor functor(1.0)
-    CostFunction* cost_function
+    auto* cost_function
         = new AutoDiffCostFunction<MyScalarCostFunctor, 1, 2, 2>(
             &functor, DO_NOT_TAKE_OWNERSHIP);
 
@@ -279,9 +288,11 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new AutoDiffCostFunction<MyScalarCostFunctor, DYNAMIC, 2, 2>(
-             new CostFunctorWithDynamicNumResiduals(1.0),   ^     ^  ^
+     auto functor = std::make_unique<CostFunctorWithDynamicNumResiduals>(1.0);
+     auto* cost_function
+         = new AutoDiffCostFunction<CostFunctorWithDynamicNumResiduals,
+                                                         DYNAMIC, 2, 2>(
+             std::move(functor),                            ^     ^  ^
              runtime_number_of_residuals); <----+           |     |  |
                                                 |           |     |  |
                                                 |           |     |  |
@@ -290,13 +301,13 @@
                Dimension of x ------------------------------------+  |
                Dimension of y ---------------------------------------+
 
-   **WARNING 1** A common beginner's error when first using
-   :class:`AutoDiffCostFunction` is to get the sizing wrong. In particular,
-   there is a tendency to set the template parameters to (dimension of
-   residual, number of parameters) instead of passing a dimension
-   parameter for *every parameter block*. In the example above, that
-   would be ``<MyScalarCostFunction, 1, 2>``, which is missing the 2
-   as the last template argument.
+   .. warning::
+       A common beginner's error when first using :class:`AutoDiffCostFunction`
+       is to get the sizing wrong. In particular, there is a tendency to set the
+       template parameters to (dimension of residual, number of parameters)
+       instead of passing a dimension parameter for *every parameter block*. In
+       the example above, that would be ``<MyScalarCostFunction, 1, 2>``, which
+       is missing the 2 as the last template argument.
 
 
 :class:`DynamicAutoDiffCostFunction`
@@ -334,9 +345,7 @@
 
      .. code-block:: c++
 
-       DynamicAutoDiffCostFunction<MyCostFunctor, 4>* cost_function =
-         new DynamicAutoDiffCostFunction<MyCostFunctor, 4>(
-           new MyCostFunctor());
+       auto* cost_function = new DynamicAutoDiffCostFunction<MyCostFunctor, 4>();
        cost_function->AddParameterBlock(5);
        cost_function->AddParameterBlock(10);
        cost_function->SetNumResiduals(21);
@@ -441,9 +450,9 @@
 
   .. code-block:: c++
 
-    CostFunction* cost_function
-        = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, 1, 2, 2>(
-            new MyScalarCostFunctor(1.0));                    ^     ^  ^  ^
+    auto* cost_function
+        = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, 1, 2, 2>(1.0)
+                                                              ^     ^  ^  ^
                                                               |     |  |  |
                                   Finite Differencing Scheme -+     |  |  |
                                   Dimension of residual ------------+  |  |
@@ -463,17 +472,18 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new NumericDiffCostFunction<MyScalarCostFunctor, CENTRAL, DYNAMIC, 2, 2>(
-             new CostFunctorWithDynamicNumResiduals(1.0),               ^     ^  ^
-             TAKE_OWNERSHIP,                                            |     |  |
-             runtime_number_of_residuals); <----+                       |     |  |
-                                                |                       |     |  |
-                                                |                       |     |  |
-               Actual number of residuals ------+                       |     |  |
-               Indicate dynamic number of residuals --------------------+     |  |
-               Dimension of x ------------------------------------------------+  |
-               Dimension of y ---------------------------------------------------+
+     auto functor = std::make_unique<CostFunctorWithDynamicNumResiduals>(1.0);
+     auto* cost_function
+         = new NumericDiffCostFunction<CostFunctorWithDynamicNumResiduals,
+                                                CENTRAL, DYNAMIC, 2, 2>(
+             std::move(functor),                            ^     ^  ^
+             runtime_number_of_residuals); <----+           |     |  |
+                                                |           |     |  |
+                                                |           |     |  |
+               Actual number of residuals ------+           |     |  |
+               Indicate dynamic number of residuals --------+     |  |
+               Dimension of x ------------------------------------+  |
+               Dimension of y ---------------------------------------+
 
 
   There are three available numeric differentiation schemes in ceres-solver:
@@ -502,18 +512,18 @@
   results, either try forward difference to improve performance or
   Ridders' method to improve accuracy.
 
-  **WARNING** A common beginner's error when first using
-  :class:`NumericDiffCostFunction` is to get the sizing wrong. In
-  particular, there is a tendency to set the template parameters to
-  (dimension of residual, number of parameters) instead of passing a
-  dimension parameter for *every parameter*. In the example above,
-  that would be ``<MyScalarCostFunctor, 1, 2>``, which is missing the
-  last ``2`` argument. Please be careful when setting the size
-  parameters.
+  .. warning::
+      A common beginner's error when first using
+      :class:`NumericDiffCostFunction` is to get the sizing wrong. In
+      particular, there is a tendency to set the template parameters to
+      (dimension of residual, number of parameters) instead of passing a
+      dimension parameter for *every parameter*. In the example above, that
+      would be ``<MyScalarCostFunctor, 1, 2>``, which is missing the last ``2``
+      argument. Please be careful when setting the size parameters.
 
 
-Numeric Differentiation & LocalParameterization
------------------------------------------------
+Numeric Differentiation & Manifolds
+-----------------------------------
 
    If your cost function depends on a parameter block that must lie on
    a manifold and the functor cannot be evaluated for values of that
@@ -522,11 +532,10 @@
 
    This is because numeric differentiation in Ceres is performed by
    perturbing the individual coordinates of the parameter blocks that
-   a cost functor depends on. In doing so, we assume that the
-   parameter blocks live in an Euclidean space and ignore the
-   structure of manifold that they live As a result some of the
-   perturbations may not lie on the manifold corresponding to the
-   parameter block.
+   a cost functor depends on. This perturbation assumes that the
+   parameter block lives on a Euclidean Manifold rather than the
+   actual manifold associated with the parameter block. As a result
+   some of the perturbed points may not lie on the manifold anymore.
 
    For example consider a four dimensional parameter block that is
    interpreted as a unit Quaternion. Perturbing the coordinates of
@@ -534,7 +543,7 @@
    parameter block.
 
    Fixing this problem requires that :class:`NumericDiffCostFunction`
-   be aware of the :class:`LocalParameterization` associated with each
+   be aware of the :class:`Manifold` associated with each
    parameter block and only generate perturbations in the local
    tangent space of each parameter block.
 
@@ -568,9 +577,8 @@
 
    .. code-block:: c++
 
-     CostFunction* cost_function
-         = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>(
-             new MyCostFunction(...), TAKE_OWNERSHIP);
+     auto* cost_function
+         = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>(...);
 
    where ``MyCostFunction`` has 1 residual and 2 parameter blocks with
    sizes 4 and 8 respectively. Look at the tests for a more detailed
@@ -611,8 +619,7 @@
 
      .. code-block:: c++
 
-       DynamicNumericDiffCostFunction<MyCostFunctor>* cost_function =
-         new DynamicNumericDiffCostFunction<MyCostFunctor>(new MyCostFunctor);
+       auto cost_function = std::make_unique<DynamicNumericDiffCostFunction<MyCostFunctor>>();
        cost_function->AddParameterBlock(5);
        cost_function->AddParameterBlock(10);
        cost_function->SetNumResiduals(21);
@@ -620,9 +627,9 @@
    As a rule of thumb, try using :class:`NumericDiffCostFunction` before
    you use :class:`DynamicNumericDiffCostFunction`.
 
-   **WARNING** The same caution about mixing local parameterizations
-   with numeric differentiation applies as is the case with
-   :class:`NumericDiffCostFunction`.
+   .. warning::
+       The same caution about mixing manifolds with numeric differentiation
+       applies as is the case with :class:`NumericDiffCostFunction`.
 
 :class:`CostFunctionToFunctor`
 ==============================
@@ -671,8 +678,8 @@
    .. code-block:: c++
 
     struct CameraProjection {
-      CameraProjection(double* observation)
-      : intrinsic_projection_(new IntrinsicProjection(observation)) {
+      explicit CameraProjection(double* observation)
+      : intrinsic_projection_(std::make_unique<IntrinsicProjection>(observation)) {
       }
 
       template <typename T>
@@ -690,7 +697,7 @@
       }
 
      private:
-      CostFunctionToFunctor<2,5,3> intrinsic_projection_;
+      CostFunctionToFunctor<2, 5, 3> intrinsic_projection_;
     };
 
    Note that :class:`CostFunctionToFunctor` takes ownership of the
@@ -732,10 +739,9 @@
   .. code-block:: c++
 
    struct CameraProjection {
-     CameraProjection(double* observation)
+     explicit CameraProjection(double* observation)
         : intrinsic_projection_(
-              new NumericDiffCostFunction<IntrinsicProjection, CENTRAL, 2, 5, 3>(
-                  new IntrinsicProjection(observation))) {}
+              std::make_unique<NumericDiffCostFunction<IntrinsicProjection, CENTRAL, 2, 5, 3>>()) {}
 
      template <typename T>
      bool operator()(const T* rotation,
@@ -793,8 +799,8 @@
    .. code-block:: c++
 
     struct CameraProjection {
-      CameraProjection(double* observation)
-          : intrinsic_projection_(new IntrinsicProjection(observation)) {
+      explicit CameraProjection(double* observation)
+          : intrinsic_projection_(std::make_unique<IntrinsicProjection>(observation)) {
       }
 
       template <typename T>
@@ -841,7 +847,7 @@
        //  my_cost_function produces N residuals
        CostFunction* my_cost_function = ...
        CHECK_EQ(N, my_cost_function->num_residuals());
-       vector<CostFunction*> conditioners;
+       std::vector<CostFunction*> conditioners;
 
        //  Make N 1x1 cost functions (1 parameter, 1 residual)
        CostFunction* f_1 = ...
@@ -864,38 +870,39 @@
 
 
 :class:`GradientChecker`
-================================
+========================
 
 .. class:: GradientChecker
 
-    This class compares the Jacobians returned by a cost function against
-    derivatives estimated using finite differencing. It is meant as a tool for
-    unit testing, giving you more fine-grained control than the check_gradients
-    option in the solver options.
+    This class compares the Jacobians returned by a cost function
+    against derivatives estimated using finite differencing. It is
+    meant as a tool for unit testing, giving you more fine-grained
+    control than the check_gradients option in the solver options.
 
     The condition enforced is that
 
     .. math:: \forall{i,j}: \frac{J_{ij} - J'_{ij}}{max_{ij}(J_{ij} - J'_{ij})} < r
 
-    where :math:`J_{ij}` is the jacobian as computed by the supplied cost
-    function (by the user) multiplied by the local parameterization Jacobian,
+    where :math:`J_{ij}` is the jacobian as computed by the supplied
+    cost function multiplied by the `Manifold::PlusJacobian`,
     :math:`J'_{ij}` is the jacobian as computed by finite differences,
-    multiplied by the local parameterization Jacobian as well, and :math:`r`
+    multiplied by the `Manifold::PlusJacobian` as well, and :math:`r`
     is the relative precision.
 
    Usage:
 
    .. code-block:: c++
 
-       //  my_cost_function takes two parameter blocks. The first has a local
-       //  parameterization associated with it.
+       // my_cost_function takes two parameter blocks. The first has a
+       // manifold associated with it.
+
        CostFunction* my_cost_function = ...
-       LocalParameterization* my_parameterization = ...
+       Manifold* my_manifold = ...
        NumericDiffOptions numeric_diff_options;
 
-       std::vector<LocalParameterization*> local_parameterizations;
-       local_parameterizations.push_back(my_parameterization);
-       local_parameterizations.push_back(nullptr);
+       std::vector<Manifold*> manifolds;
+       manifolds.push_back(my_manifold);
+       manifolds.push_back(nullptr);
 
        std::vector parameter1;
        std::vector parameter2;
@@ -906,7 +913,8 @@
        parameter_blocks.push_back(parameter2.data());
 
        GradientChecker gradient_checker(my_cost_function,
-           local_parameterizations, numeric_diff_options);
+                                        manifolds,
+                                        numeric_diff_options);
        GradientCheckResults results;
        if (!gradient_checker.Probe(parameter_blocks.data(), 1e-9, &results) {
          LOG(ERROR) << "An error has occurred:\n" << results.error_log;
@@ -1065,6 +1073,10 @@
 
    .. math:: \rho(s,a,b) = b \log(1 + e^{(s - a) / b}) - b \log(1 + e^{-a / b})
 
+.. class:: TukeyLoss
+
+   .. math:: \rho(s) = \begin{cases} \frac{1}{3} (1 - (1 - s)^3) & s \le 1\\ \frac{1}{3} & s > 1 \end{cases}
+
 .. class:: ComposedLoss
 
    Given two loss functions ``f`` and ``g``, implements the loss
@@ -1115,9 +1127,8 @@
 
      // Add parameter blocks
 
-     CostFunction* cost_function =
-         new AutoDiffCostFunction < UW_Camera_Mapper, 2, 9, 3>(
-             new UW_Camera_Mapper(feature_x, feature_y));
+     auto* cost_function =
+         new AutoDiffCostFunction<UW_Camera_Mapper, 2, 9, 3>(feature_x, feature_y);
 
      LossFunctionWrapper* loss_function(new HuberLoss(1.0), TAKE_OWNERSHIP);
      problem.AddResidualBlock(cost_function, loss_function, parameters);
@@ -1154,7 +1165,6 @@
 matrix such that the robustified Gauss-Newton step corresponds to an
 ordinary linear least squares problem.
 
-
 Let :math:`\alpha` be a root of
 
 .. math:: \frac{1}{2}\alpha^2 - \alpha - \frac{\rho''}{\rho'}\|f(x)\|^2 = 0.
@@ -1178,363 +1188,644 @@
 problems.
 
 
-:class:`LocalParameterization`
-==============================
-
-.. class:: LocalParameterization
-
-  In many optimization problems, especially sensor fusion problems,
-  one has to model quantities that live in spaces known as `Manifolds
-  <https://en.wikipedia.org/wiki/Manifold>`_ , for example the
-  rotation/orientation of a sensor that is represented by a
-  `Quaternion
-  <https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation>`_.
-
-  Manifolds are spaces, which locally look like Euclidean spaces. More
-  precisely, at each point on the manifold there is a linear space
-  that is tangent to the manifold. It has dimension equal to the
-  intrinsic dimension of the manifold itself, which is less than or
-  equal to the ambient space in which the manifold is embedded.
-
-  For example, the tangent space to a point on a sphere in three
-  dimensions is the two dimensional plane that is tangent to the
-  sphere at that point. There are two reasons tangent spaces are
-  interesting:
-
-  1. They are Euclidean spaces, so the usual vector space operations
-     apply there, which makes numerical operations easy.
-
-  2. Movement in the tangent space translate into movements along the
-     manifold.  Movements perpendicular to the tangent space do not
-     translate into movements on the manifold.
-
-     Returning to our sphere example, moving in the 2 dimensional
-     plane tangent to the sphere and projecting back onto the sphere
-     will move you away from the point you started from but moving
-     along the normal at the same point and the projecting back onto
-     the sphere brings you back to the point.
-
-  Besides the mathematical niceness, modeling manifold valued
-  quantities correctly and paying attention to their geometry has
-  practical benefits too:
-
-  1. It naturally constrains the quantity to the manifold through out
-     the optimization. Freeing the user from hacks like *quaternion
-     normalization*.
-
-  2. It reduces the dimension of the optimization problem to its
-     *natural* size. For example, a quantity restricted to a line, is a
-     one dimensional object regardless of the dimension of the ambient
-     space in which this line lives.
-
-     Working in the tangent space reduces not just the computational
-     complexity of the optimization algorithm, but also improves the
-     numerical behaviour of the algorithm.
-
-  A basic operation one can perform on a manifold is the
-  :math:`\boxplus` operation that computes the result of moving along
-  delta in the tangent space at x, and then projecting back onto the
-  manifold that x belongs to. Also known as a *Retraction*,
-  :math:`\boxplus` is a generalization of vector addition in Euclidean
-  spaces. Formally, :math:`\boxplus` is a smooth map from a
-  manifold :math:`\mathcal{M}` and its tangent space
-  :math:`T_\mathcal{M}` to the manifold :math:`\mathcal{M}` that
-  obeys the identity
-
-  .. math::  \boxplus(x, 0) = x,\quad \forall x.
-
-  That is, it ensures that the tangent space is *centered* at :math:`x`
-  and the zero vector is the identity element. For more see
-  [Hertzberg]_ and section A.6.9 of [HartleyZisserman]_.
-
-  Let us consider two examples:
-
-  The Euclidean space :math:`R^n` is the simplest example of a
-  manifold. It has dimension :math:`n` (and so does its tangent space)
-  and :math:`\boxplus` is the familiar vector sum operation.
-
-    .. math:: \boxplus(x, \Delta) = x + \Delta
-
-  A more interesting case is :math:`SO(3)`, the special orthogonal
-  group in three dimensions - the space of 3x3 rotation
-  matrices. :math:`SO(3)` is a three dimensional manifold embedded in
-  :math:`R^9` or :math:`R^{3\times 3}`.
-
-  :math:`\boxplus` on :math:`SO(3)` is defined using the *Exponential*
-  map, from the tangent space (:math:`R^3`) to the manifold. The
-  Exponential map :math:`\operatorname{Exp}` is defined as:
-
-  .. math::
-
-     \operatorname{Exp}([p,q,r]) = \left [ \begin{matrix}
-     \cos \theta + cp^2 & -sr + cpq        &  sq + cpr \\
-     sr + cpq         & \cos \theta + cq^2& -sp + cqr \\
-     -sq + cpr        & sp + cqr         & \cos \theta + cr^2
-     \end{matrix} \right ]
-
-  where,
-
-  .. math::
-     \theta = \sqrt{p^2 + q^2 + r^2}, s = \frac{\sin \theta}{\theta},
-     c = \frac{1 - \cos \theta}{\theta^2}.
-
-  Then,
-
-  .. math::
-
-     \boxplus(x, \Delta) = x \operatorname{Exp}(\Delta)
-
-  The ``LocalParameterization`` interface allows the user to define
-  and associate with parameter blocks the manifold that they belong
-  to. It does so by defining the ``Plus`` (:math:`\boxplus`) operation
-  and its derivative with respect to :math:`\Delta` at :math:`\Delta =
-  0`.
-
-   .. code-block:: c++
-
-     class LocalParameterization {
-      public:
-       virtual ~LocalParameterization() {}
-       virtual bool Plus(const double* x,
-                         const double* delta,
-                         double* x_plus_delta) const = 0;
-       virtual bool ComputeJacobian(const double* x, double* jacobian) const = 0;
-       virtual bool MultiplyByJacobian(const double* x,
-                                       const int num_rows,
-                                       const double* global_matrix,
-                                       double* local_matrix) const;
-       virtual int GlobalSize() const = 0;
-       virtual int LocalSize() const = 0;
-     };
+While the theory described above is elegant, in practice we observe
+that using the Triggs correction when :math:`\rho'' > 0` leads to poor
+performance, so we upper bound it by zero. For more details see
+`corrector.cc <https://github.com/ceres-solver/ceres-solver/blob/master/internal/ceres/corrector.cc#L51>`_
 
 
-.. function:: int LocalParameterization::GlobalSize()
+:class:`Manifold`
+==================
 
-   The dimension of the ambient space in which the parameter block
-   :math:`x` lives.
+.. class:: Manifold
 
-.. function:: int LocalParameterization::LocalSize()
+In sensor fusion problems, we often have to model quantities that live
+in spaces known as `Manifolds
+<https://en.wikipedia.org/wiki/Manifold>`_, for example the
+rotation/orientation of a sensor that is represented by a `Quaternion
+<https://en.wikipedia.org/wiki/Quaternion>`_.
 
-   The size of the tangent space that :math:`\Delta` lives in.
+Manifolds are spaces which locally look like Euclidean spaces. More
+precisely, at each point on the manifold there is a linear space that
+is tangent to the manifold. It has dimension equal to the intrinsic
+dimension of the manifold itself, which is less than or equal to the
+ambient space in which the manifold is embedded.
 
-.. function:: bool LocalParameterization::Plus(const double* x, const double* delta, double* x_plus_delta) const
+For example, the tangent space to a point on a sphere in three
+dimensions is the two dimensional plane that is tangent to the sphere
+at that point. There are two reasons tangent spaces are interesting:
 
-    :func:`LocalParameterization::Plus` implements :math:`\boxplus(x,\Delta)`.
+1. They are Eucliean spaces so the usual vector space operations apply
+   there, which makes numerical operations easy.
 
-.. function:: bool LocalParameterization::ComputeJacobian(const double* x, double* jacobian) const
+2. Movements in the tangent space translate into movements along the
+   manifold.  Movements perpendicular to the tangent space do not
+   translate into movements on the manifold.
 
-   Computes the Jacobian matrix
+However, moving along the 2 dimensional plane tangent to the sphere
+and projecting back onto the sphere will move you away from the point
+you started from but moving along the normal at the same point and the
+projecting back onto the sphere brings you back to the point.
 
-   .. math:: J = D_2 \boxplus(x, 0)
+Besides the mathematical niceness, modeling manifold valued
+quantities correctly and paying attention to their geometry has
+practical benefits too:
 
-   in row major form.
+1. It naturally constrains the quantity to the manifold throughout the
+   optimization, freeing the user from hacks like *quaternion
+   normalization*.
 
-.. function:: bool MultiplyByJacobian(const double* x, const int num_rows, const double* global_matrix, double* local_matrix) const
+2. It reduces the dimension of the optimization problem to its
+   *natural* size. For example, a quantity restricted to a line is a
+   one dimensional object regardless of the dimension of the ambient
+   space in which this line lives.
 
-   ``local_matrix = global_matrix * jacobian``
+   Working in the tangent space reduces not just the computational
+   complexity of the optimization algorithm, but also improves the
+   numerical behaviour of the algorithm.
 
-   ``global_matrix`` is a ``num_rows x GlobalSize``  row major matrix.
-   ``local_matrix`` is a ``num_rows x LocalSize`` row major matrix.
-   ``jacobian`` is the matrix returned by :func:`LocalParameterization::ComputeJacobian` at :math:`x`.
+A basic operation one can perform on a manifold is the
+:math:`\boxplus` operation that computes the result of moving along
+:math:`\delta` in the tangent space at :math:`x`, and then projecting
+back onto the manifold that :math:`x` belongs to. Also known as a
+*Retraction*, :math:`\boxplus` is a generalization of vector addition
+in Euclidean spaces.
 
-   This is only used by :class:`GradientProblem`. For most normal
-   uses, it is okay to use the default implementation.
+The inverse of :math:`\boxplus` is :math:`\boxminus`, which given two
+points :math:`y` and :math:`x` on the manifold computes the tangent
+vector :math:`\Delta` at :math:`x` s.t. :math:`\boxplus(x, \Delta) =
+y`.
+
+Let us now consider two examples.
+
+The `Euclidean space <https://en.wikipedia.org/wiki/Euclidean_space>`_
+:math:`\mathbb{R}^n` is the simplest example of a manifold. It has
+dimension :math:`n` (and so does its tangent space) and
+:math:`\boxplus` and :math:`\boxminus` are the familiar vector sum and
+difference operations.
+
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= x + \Delta = y\\
+   \boxminus(y, x) &= y - x = \Delta.
+   \end{align*}
+
+A more interesting case is the case :math:`SO(3)`, the `special
+orthogonal group <https://en.wikipedia.org/wiki/3D_rotation_group>`_
+in three dimensions - the space of :math:`3\times3` rotation
+matrices. :math:`SO(3)` is a three dimensional manifold embedded in
+:math:`\mathbb{R}^9` or :math:`\mathbb{R}^{3\times 3}`.  So points on :math:`SO(3)` are
+represented using 9 dimensional vectors or :math:`3\times 3` matrices,
+and points in its tangent spaces are represented by 3 dimensional
+vectors.
+
+For :math:`SO(3)`, :math:`\boxplus` and :math:`\boxminus` are defined
+in terms of the matrix :math:`\exp` and :math:`\log` operations as
+follows:
+
+Given a 3-vector :math:`\Delta = [\begin{matrix}p,&q,&r\end{matrix}]`, we have
+
+.. math::
+
+   \exp(\Delta) & = \left [ \begin{matrix}
+   \cos \theta + cp^2 & -sr + cpq        &  sq + cpr \\
+   sr + cpq         & \cos \theta + cq^2& -sp + cqr \\
+   -sq + cpr        & sp + cqr         & \cos \theta + cr^2
+   \end{matrix} \right ]
+
+where,
+
+.. math::
+     \begin{align}
+     \theta &= \sqrt{p^2 + q^2 + r^2},\\
+     s &= \frac{\sin \theta}{\theta},\\
+     c &= \frac{1 - \cos \theta}{\theta^2}.
+     \end{align}
+
+Given :math:`x \in SO(3)`, we have
+
+.. math::
+
+   \log(x) = 1/(2 \sin(\theta)/\theta)\left[\begin{matrix} x_{32} - x_{23},& x_{13} - x_{31},& x_{21} - x_{12}\end{matrix} \right]
+
+
+where,
+
+.. math:: \theta = \cos^{-1}((\operatorname{Trace}(x) - 1)/2)
+
+Then,
+
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= x \exp(\Delta)
+   \\
+   \boxminus(y, x) &= \log(x^T y)
+   \end{align*}
+
+For :math:`\boxplus` and :math:`\boxminus` to be mathematically
+consistent, the following identities must be satisfied at all points
+:math:`x` on the manifold:
+
+1. :math:`\boxplus(x, 0) = x`. This ensures that the tangent space is
+   *centered* at :math:`x`, and the zero vector is the identity
+   element.
+2. For all :math:`y` on the manifold, :math:`\boxplus(x,
+   \boxminus(y,x)) = y`. This ensures that any :math:`y` can be
+   reached from :math:`x`.
+3. For all :math:`\Delta`, :math:`\boxminus(\boxplus(x, \Delta), x) =
+   \Delta`. This ensures that :math:`\boxplus` is an injective
+   (one-to-one) map.
+4. For all :math:`\Delta_1, \Delta_2\ |\boxminus(\boxplus(x, \Delta_1),
+   \boxplus(x, \Delta_2)) \leq |\Delta_1 - \Delta_2|`. Allows us to define
+   a metric on the manifold.
+
+Additionally we require that :math:`\boxplus` and :math:`\boxminus` be
+sufficiently smooth. In particular they need to be differentiable
+everywhere on the manifold.
+
+For more details, please see [Hertzberg]_
+
+The :class:`Manifold` interface allows the user to define a manifold
+for the purposes optimization by implementing ``Plus`` and ``Minus``
+operations and their derivatives (corresponding naturally to
+:math:`\boxplus` and :math:`\boxminus`).
+
+.. code-block:: c++
+
+  class Manifold {
+   public:
+    virtual ~Manifold();
+    virtual int AmbientSize() const = 0;
+    virtual int TangentSize() const = 0;
+    virtual bool Plus(const double* x,
+                      const double* delta,
+                      double* x_plus_delta) const = 0;
+    virtual bool PlusJacobian(const double* x, double* jacobian) const = 0;
+    virtual bool RightMultiplyByPlusJacobian(const double* x,
+                                             const int num_rows,
+                                             const double* ambient_matrix,
+                                             double* tangent_matrix) const;
+    virtual bool Minus(const double* y,
+                       const double* x,
+                       double* y_minus_x) const = 0;
+    virtual bool MinusJacobian(const double* x, double* jacobian) const = 0;
+  };
+
+
+.. function:: int Manifold::AmbientSize() const;
+
+   Dimension of the ambient space in which the manifold is embedded.
+
+.. function:: int Manifold::TangentSize() const;
+
+   Dimension of the manifold/tangent space.
+
+.. function:: bool Plus(const double* x, const double* delta, double* x_plus_delta) const;
+
+   Implements the :math:`\boxplus(x,\Delta)` operation for the manifold.
+
+   A generalization of vector addition in Euclidean space, ``Plus``
+   computes the result of moving along ``delta`` in the tangent space
+   at ``x``, and then projecting back onto the manifold that ``x``
+   belongs to.
+
+   ``x`` and ``x_plus_delta`` are :func:`Manifold::AmbientSize` vectors.
+   ``delta`` is a :func:`Manifold::TangentSize` vector.
+
+   Return value indicates if the operation was successful or not.
+
+.. function:: bool PlusJacobian(const double* x, double* jacobian) const;
+
+   Compute the derivative of :math:`\boxplus(x, \Delta)` w.r.t
+   :math:`\Delta` at :math:`\Delta = 0`, i.e. :math:`(D_2
+   \boxplus)(x, 0)`.
+
+   ``jacobian`` is a row-major :func:`Manifold::AmbientSize`
+   :math:`\times` :func:`Manifold::TangentSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
+
+.. function:: bool RightMultiplyByPlusJacobian(const double* x, const int num_rows, const double* ambient_matrix, double* tangent_matrix) const;
+
+   ``tangent_matrix`` = ``ambient_matrix`` :math:`\times` plus_jacobian.
+
+
+   ``ambient_matrix`` is a row-major ``num_rows`` :math:`\times`
+   :func:`Manifold::AmbientSize` matrix.
+
+   ``tangent_matrix`` is a row-major ``num_rows`` :math:`\times`
+   :func:`Manifold::TangentSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
+
+   This function is only used by the :class:`GradientProblemSolver`,
+   where the dimension of the parameter block can be large and it may
+   be more efficient to compute this product directly rather than
+   first evaluating the Jacobian into a matrix and then doing a matrix
+   vector product.
+
+   Because this is not an often used function, we provide a default
+   implementation for convenience. If performance becomes an issue
+   then the user should consider implementing a specialization.
+
+.. function:: bool Minus(const double* y, const double* x, double* y_minus_x) const;
+
+   Implements :math:`\boxminus(y,x)` operation for the manifold.
+
+   A generalization of vector subtraction in Euclidean spaces, given
+   two points ``x`` and ``y`` on the manifold, ``Minus`` computes the
+   change to ``x`` in the tangent space at ``x``, that will take it to
+   ``y``.
+
+   ``x`` and ``y`` are :func:`Manifold::AmbientSize` vectors.
+   ``y_minus_x`` is a ::func:`Manifold::TangentSize` vector.
+
+   Return value indicates if the operation was successful or not.
+
+.. function:: bool MinusJacobian(const double* x, double* jacobian) const = 0;
+
+   Compute the derivative of :math:`\boxminus(y, x)` w.r.t :math:`y`
+   at :math:`y = x`, i.e :math:`(D_1 \boxminus) (x, x)`.
+
+   ``jacobian`` is a row-major :func:`Manifold::TangentSize`
+   :math:`\times` :func:`Manifold::AmbientSize` matrix.
+
+   Return value indicates whether the operation was successful or not.
 
 Ceres Solver ships with a number of commonly used instances of
-:class:`LocalParameterization`. Another great place to find high
-quality implementations of :math:`\boxplus` operations on a variety of
-manifolds is the `Sophus <https://github.com/strasdat/Sophus>`_
-library developed by Hauke Strasdat and his collaborators.
+:class:`Manifold`.
 
-:class:`IdentityParameterization`
----------------------------------
+For `Lie Groups <https://en.wikipedia.org/wiki/Lie_group>`_, a great
+place to find high quality implementations is the `Sophus
+<https://github.com/strasdat/Sophus>`_ library developed by Hauke
+Strasdat and his collaborators.
 
-A trivial version of :math:`\boxplus` is when :math:`\Delta` is of the
-same size as :math:`x` and
+:class:`EuclideanManifold`
+--------------------------
 
-.. math::  \boxplus(x, \Delta) = x + \Delta
+.. class:: EuclideanManifold
 
-This is the same as :math:`x` living in a Euclidean manifold.
+:class:`EuclideanManifold` as the name implies represents a Euclidean
+space, where the :math:`\boxplus` and :math:`\boxminus` operations are
+the usual vector addition and subtraction.
 
-:class:`QuaternionParameterization`
------------------------------------
+.. math::
 
-Another example that occurs commonly in Structure from Motion problems
-is when camera rotations are parameterized using a quaternion. This is
-a 3-dimensional manifold that lives in 4-dimensional space.
+   \begin{align*}
+     \boxplus(x, \Delta) &= x + \Delta\\
+      \boxminus(y,x) &= y - x
+   \end{align*}
 
-.. math:: \boxplus(x, \Delta) = \left[ \cos(|\Delta|), \frac{\sin\left(|\Delta|\right)}{|\Delta|} \Delta \right] * x
+By default parameter blocks are assumed to be Euclidean, so there is
+no need to use this manifold on its own. It is provided for the
+purpose of testing and for use in combination with other manifolds
+using :class:`ProductManifold`.
 
-The multiplication :math:`*` between the two 4-vectors on the right
-hand side is the standard quaternion product.
+The class works with dynamic and static ambient space dimensions. If
+the ambient space dimensions is known at compile time use
 
-:class:`EigenQuaternionParameterization`
-----------------------------------------
+.. code-block:: c++
 
-`Eigen <http://eigen.tuxfamily.org/index.php?title=Main_Page>`_ uses a
-different internal memory layout for the elements of the quaternion
-than what is commonly used. Specifically, Eigen stores the elements in
-memory as :math:`(x, y, z, w)`, i.e., the *real* part (:math:`w`) is
-stored as the last element. Note, when creating an Eigen quaternion
-through the constructor the elements are accepted in :math:`w, x, y,
-z` order.
+   EuclideanManifold<3> manifold;
 
-Since Ceres operates on parameter blocks which are raw ``double``
-pointers this difference is important and requires a different
-parameterization. :class:`EigenQuaternionParameterization` uses the
-same ``Plus`` operation as :class:`QuaternionParameterization` but
-takes into account Eigen's internal memory element ordering.
+If the ambient space dimensions is not known at compile time the
+template parameter needs to be set to `ceres::DYNAMIC` and the actual
+dimension needs to be provided as a constructor argument:
 
-:class:`SubsetParameterization`
--------------------------------
+.. code-block:: c++
+
+   EuclideanManifold<ceres::DYNAMIC> manifold(ambient_dim);
+
+:class:`SubsetManifold`
+-----------------------
+
+.. class:: SubsetManifold
 
 Suppose :math:`x` is a two dimensional vector, and the user wishes to
 hold the first coordinate constant. Then, :math:`\Delta` is a scalar
 and :math:`\boxplus` is defined as
 
-.. math:: \boxplus(x, \Delta) = x + \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \Delta
+.. math::
+   \boxplus(x, \Delta) = x + \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \Delta
 
-:class:`SubsetParameterization` generalizes this construction to hold
+and given two, two-dimensional vectors :math:`x` and :math:`y` with
+the same first coordinate, :math:`\boxminus` is defined as:
+
+.. math::
+   \boxminus(y, x) = y[1] - x[1]
+
+:class:`SubsetManifold` generalizes this construction to hold
 any part of a parameter block constant by specifying the set of
 coordinates that are held constant.
 
 .. NOTE::
-   It is legal to hold all coordinates of a parameter block to constant
-   using a :class:`SubsetParameterization`. It is the same as calling
+
+   It is legal to hold *all* coordinates of a parameter block to
+   constant using a :class:`SubsetManifold`. It is the same as calling
    :func:`Problem::SetParameterBlockConstant` on that parameter block.
 
-:class:`HomogeneousVectorParameterization`
-------------------------------------------
 
-In computer vision, homogeneous vectors are commonly used to represent
-objects in projective geometry such as points in projective space. One
-example where it is useful to use this over-parameterization is in
-representing points whose triangulation is ill-conditioned. Here it is
-advantageous to use homogeneous vectors, instead of an Euclidean
-vector, because it can represent points at and near infinity.
+:class:`ProductManifold`
+------------------------
 
-:class:`HomogeneousVectorParameterization` defines a
-:class:`LocalParameterization` for an :math:`n-1` dimensional
-manifold that embedded in :math:`n` dimensional space where the
-scale of the vector does not matter, i.e., elements of the
-projective space :math:`\mathbb{P}^{n-1}`. It assumes that the last
-coordinate of the :math:`n`-vector is the *scalar* component of the
-homogenous vector, i.e., *finite* points in this representation are
-those for which the *scalar* component is non-zero.
-
-Further, ``HomogeneousVectorParameterization::Plus`` preserves the
-scale of :math:`x`.
-
-:class:`LineParameterization`
------------------------------
-
-This class provides a parameterization for lines, where the line is
-defined using an origin point and a direction vector. So the
-parameter vector size needs to be two times the ambient space
-dimension, where the first half is interpreted as the origin point
-and the second half as the direction. This local parameterization is
-a special case of the `Affine Grassmannian manifold
-<https://en.wikipedia.org/wiki/Affine_Grassmannian_(manifold))>`_
-for the case :math:`\operatorname{Graff}_1(R^n)`.
-
-Note that this is a parameterization for a line, rather than a point
-constrained to lie on a line. It is useful when one wants to optimize
-over the space of lines. For example, :math:`n` distinct points in 3D
-(measurements) we want to find the line that minimizes the sum of
-squared distances to all the points.
-
-:class:`ProductParameterization`
---------------------------------
-
-Consider an optimization problem over the space of rigid
-transformations :math:`SE(3)`, which is the Cartesian product of
-:math:`SO(3)` and :math:`\mathbb{R}^3`. Suppose you are using
-Quaternions to represent the rotation, Ceres ships with a local
-parameterization for that and :math:`\mathbb{R}^3` requires no, or
-:class:`IdentityParameterization` parameterization. So how do we
-construct a local parameterization for a parameter block a rigid
-transformation?
+.. class:: ProductManifold
 
 In cases, where a parameter block is the Cartesian product of a number
-of manifolds and you have the local parameterization of the individual
-manifolds available, :class:`ProductParameterization` can be used to
-construct a local parameterization of the cartesian product. For the
-case of the rigid transformation, where say you have a parameter block
-of size 7, where the first four entries represent the rotation as a
-quaternion, a local parameterization can be constructed as
+of manifolds and you have the manifold of the individual
+parameter blocks available, :class:`ProductManifold` can be used to
+construct a :class:`Manifold` of the Cartesian product.
+
+For the case of the rigid transformation, where say you have a
+parameter block of size 7, where the first four entries represent the
+rotation as a quaternion, and the next three the translation, a
+manifold can be constructed as:
 
 .. code-block:: c++
 
-   ProductParameterization se3_param(new QuaternionParameterization(),
-                                     new IdentityParameterization(3));
+   ProductManifold<QuaternionManifold, EuclideanManifold<3>> se3;
+
+Manifolds can be copied and moved to :class:`ProductManifold`:
+
+.. code-block:: c++
+
+   SubsetManifold manifold1(5, {2});
+   SubsetManifold manifold2(3, {0, 1});
+   ProductManifold<SubsetManifold, SubsetManifold> manifold(manifold1,
+                                                            manifold2);
+
+In advanced use cases, manifolds can be dynamically allocated and passed as (smart) pointers:
+
+.. code-block:: c++
+
+   ProductManifold<std::unique_ptr<QuaternionManifold>, EuclideanManifold<3>> se3
+        {std::make_unique<QuaternionManifold>(), EuclideanManifold<3>{}};
+
+The template parameters can also be left out as they are deduced automatically
+making the initialization much simpler:
+
+.. code-block:: c++
+
+   ProductManifold se3{QuaternionManifold{}, EuclideanManifold<3>{}};
 
 
-:class:`AutoDiffLocalParameterization`
-======================================
+:class:`QuaternionManifold`
+---------------------------
 
-.. class:: AutoDiffLocalParameterization
+.. class:: QuaternionManifold
 
-  :class:`AutoDiffLocalParameterization` does for
-  :class:`LocalParameterization` what :class:`AutoDiffCostFunction`
-  does for :class:`CostFunction`. It allows the user to define a
-  templated functor that implements the
-  :func:`LocalParameterization::Plus` operation and it uses automatic
-  differentiation to implement the computation of the Jacobian.
+.. NOTE::
 
-  To get an auto differentiated local parameterization, you must
-  define a class with a templated operator() (a functor) that computes
+   If you are using ``Eigen`` quaternions, then you should use
+   :class:`EigenQuaternionManifold` instead because ``Eigen`` uses a
+   different memory layout for its Quaternions.
 
-     .. math:: x' = \boxplus(x, \Delta x),
+Manifold for a Hamilton `Quaternion
+<https://en.wikipedia.org/wiki/Quaternion>`_. Quaternions are a three
+dimensional manifold represented as unit norm 4-vectors, i.e.
 
-  For example, Quaternions have a three dimensional local
-  parameterization. Its plus operation can be implemented as (taken
-  from `internal/ceres/autodiff_local_parameterization_test.cc
-  <https://ceres-solver.googlesource.com/ceres-solver/+/master/internal/ceres/autodiff_local_parameterization_test.cc>`_
-  )
+.. math:: q = \left [\begin{matrix}q_0,& q_1,& q_2,& q_3\end{matrix}\right], \quad \|q\| = 1
 
-    .. code-block:: c++
+is the ambient space representation. Here :math:`q_0` is the scalar
+part. :math:`q_1` is the coefficient of :math:`i`, :math:`q_2` is the
+coefficient of :math:`j`, and :math:`q_3` is the coefficient of
+:math:`k`. Where:
 
-      struct QuaternionPlus {
-        template<typename T>
-        bool operator()(const T* x, const T* delta, T* x_plus_delta) const {
-          const T squared_norm_delta =
-              delta[0] * delta[0] + delta[1] * delta[1] + delta[2] * delta[2];
+.. math::
 
-          T q_delta[4];
-          if (squared_norm_delta > 0.0) {
-            T norm_delta = sqrt(squared_norm_delta);
-            const T sin_delta_by_delta = sin(norm_delta) / norm_delta;
-            q_delta[0] = cos(norm_delta);
-            q_delta[1] = sin_delta_by_delta * delta[0];
-            q_delta[2] = sin_delta_by_delta * delta[1];
-            q_delta[3] = sin_delta_by_delta * delta[2];
-          } else {
-            // We do not just use q_delta = [1,0,0,0] here because that is a
-            // constant and when used for automatic differentiation will
-            // lead to a zero derivative. Instead we take a first order
-            // approximation and evaluate it at zero.
-            q_delta[0] = T(1.0);
-            q_delta[1] = delta[0];
-            q_delta[2] = delta[1];
-            q_delta[3] = delta[2];
-          }
+   \begin{align*}
+   i\times j &= k,\\
+   j\times k &= i,\\
+   k\times i &= j,\\
+   i\times i &= -1,\\
+   j\times j &= -1,\\
+   k\times k &= -1.
+   \end{align*}
 
-          Quaternionproduct(q_delta, x, x_plus_delta);
-          return true;
-        }
-      };
+The tangent space is three dimensional and the :math:`\boxplus` and
+:math:`\boxminus` operators are defined in term of :math:`\exp` and
+:math:`\log` operations.
 
-  Given this struct, the auto differentiated local
-  parameterization can now be constructed as
+.. math::
+   \begin{align*}
+   \boxplus(x, \Delta) &= \exp\left(\Delta\right) \otimes  x \\
+   \boxminus(y,x) &= \log\left(y \otimes x^{-1}\right)
+   \end{align*}
 
-  .. code-block:: c++
+Where :math:`\otimes` is the `Quaternion product
+<https://en.wikipedia.org/wiki/Quaternion#Hamilton_product>`_ and
+since :math:`x` is a unit quaternion, :math:`x^{-1} = [\begin{matrix}
+q_0,& -q_1,& -q_2,& -q_3\end{matrix}]`. Given a vector :math:`\Delta
+\in \mathbb{R}^3`,
 
-     LocalParameterization* local_parameterization =
-         new AutoDiffLocalParameterization<QuaternionPlus, 4, 3>;
-                                                           |  |
-                                Global Size ---------------+  |
-                                Local Size -------------------+
+.. math::
+   \exp(\Delta) = \left[ \begin{matrix}
+                         \cos\left(\|\Delta\|\right)\\
+			 \frac{\displaystyle \sin\left(|\Delta\|\right)}{\displaystyle \|\Delta\|} \Delta
+    	                 \end{matrix} \right]
 
+and given a unit quaternion :math:`q = \left [\begin{matrix}q_0,& q_1,& q_2,& q_3\end{matrix}\right]`
+
+.. math::
+
+   \log(q) =  \frac{\operatorname{atan2}\left(\sqrt{1-q_0^2},q_0\right)}{\sqrt{1-q_0^2}} \left [\begin{matrix}q_1,& q_2,& q_3\end{matrix}\right]
+
+
+:class:`EigenQuaternionManifold`
+--------------------------------
+
+.. class:: EigenQuaternionManifold
+
+Implements the quaternion manifold for `Eigen's
+<http://eigen.tuxfamily.org/index.php?title=Main_Page>`_
+representation of the Hamilton quaternion. Geometrically it is exactly
+the same as the :class:`QuaternionManifold` defined above. However,
+Eigen uses a different internal memory layout for the elements of the
+quaternion than what is commonly used. It stores the quaternion in
+memory as :math:`[q_1, q_2, q_3, q_0]` or :math:`[x, y, z, w]` where
+the real (scalar) part is last.
+
+Since Ceres operates on parameter blocks which are raw double pointers
+this difference is important and requires a different manifold.
+
+:class:`SphereManifold`
+-----------------------
+
+.. class:: SphereManifold
+
+This provides a manifold on a sphere meaning that the norm of the
+vector stays the same. Such cases often arises in Structure for Motion
+problems. One example where they are used is in representing points
+whose triangulation is ill-conditioned. Here it is advantageous to use
+an over-parameterization since homogeneous vectors can represent
+points at infinity.
+
+The ambient space dimension is required to be greater than 1.
+
+The class works with dynamic and static ambient space dimensions. If
+the ambient space dimensions is known at compile time use
+
+.. code-block:: c++
+
+   SphereManifold<3> manifold;
+
+If the ambient space dimensions is not known at compile time the
+template parameter needs to be set to `ceres::DYNAMIC` and the actual
+dimension needs to be provided as a constructor argument:
+
+.. code-block:: c++
+
+   SphereManifold<ceres::DYNAMIC> manifold(ambient_dim);
+
+For more details, please see Section B.2 (p.25) in [Hertzberg]_
+
+
+:class:`LineManifold`
+---------------------
+
+.. class:: LineManifold
+
+This class provides a manifold for lines, where the line is defined
+using an origin point and a direction vector. So the ambient size
+needs to be two times the dimension of the space in which the line
+lives.  The first half of the parameter block is interpreted as the
+origin point and the second half as the direction. This manifold is a
+special case of the `Affine Grassmannian manifold
+<https://en.wikipedia.org/wiki/Affine_Grassmannian_(manifold))>`_ for
+the case :math:`\operatorname{Graff}_1(R^n)`.
+
+Note that this is a manifold for a line, rather than a point
+constrained to lie on a line. It is useful when one wants to optimize
+over the space of lines. For example, given :math:`n` distinct points
+in 3D (measurements) we want to find the line that minimizes the sum
+of squared distances to all the points.
+
+:class:`AutoDiffManifold`
+=========================
+
+.. class:: AutoDiffManifold
+
+Create a :class:`Manifold` with Jacobians computed via automatic
+differentiation.
+
+To get an auto differentiated manifold, you must define a Functor with
+templated ``Plus`` and ``Minus`` functions that compute:
+
+.. code-block:: c++
+
+  x_plus_delta = Plus(x, delta);
+  y_minus_x    = Minus(y, x);
+
+Where, ``x``, ``y`` and ``x_plus_delta`` are vectors on the manifold in
+the ambient space (so they are ``kAmbientSize`` vectors) and
+``delta``, ``y_minus_x`` are vectors in the tangent space (so they are
+``kTangentSize`` vectors).
+
+The Functor should have the signature:
+
+.. code-block:: c++
+
+   struct Functor {
+    template <typename T>
+    bool Plus(const T* x, const T* delta, T* x_plus_delta) const;
+
+    template <typename T>
+    bool Minus(const T* y, const T* x, T* y_minus_x) const;
+   };
+
+
+Observe that  the ``Plus`` and  ``Minus`` operations are  templated on
+the parameter  ``T``.  The autodiff framework  substitutes appropriate
+``Jet``  objects for  ``T`` in  order to  compute the  derivative when
+necessary.  This  is  the  same  mechanism that  is  used  to  compute
+derivatives when using :class:`AutoDiffCostFunction`.
+
+``Plus`` and ``Minus`` should return true if the computation is
+successful and false otherwise, in which case the result will not be
+used.
+
+Given this Functor, the corresponding :class:`Manifold` can be constructed as:
+
+.. code-block:: c++
+
+   AutoDiffManifold<Functor, kAmbientSize, kTangentSize> manifold;
+
+.. NOTE::
+
+   The following is only used for illustration purposes. Ceres Solver
+   ships with an optimized, production grade :class:`QuaternionManifold`
+   implementation.
+
+As a concrete example consider the case of `Quaternions
+<https://en.wikipedia.org/wiki/Quaternion>`_. Quaternions form a three
+dimensional manifold embedded in :math:`\mathbb{R}^4`, i.e. they have
+an ambient dimension of 4 and their tangent space has dimension 3. The
+following Functor defines the ``Plus`` and ``Minus`` operations on the
+Quaternion manifold. It assumes that the quaternions are laid out as
+``[w,x,y,z]`` in memory, i.e. the real or scalar part is the first
+coordinate.
+
+.. code-block:: c++
+
+   struct QuaternionFunctor {
+     template <typename T>
+     bool Plus(const T* x, const T* delta, T* x_plus_delta) const {
+       const T squared_norm_delta =
+           delta[0] * delta[0] + delta[1] * delta[1] + delta[2] * delta[2];
+
+       T q_delta[4];
+       if (squared_norm_delta > T(0.0)) {
+         T norm_delta = sqrt(squared_norm_delta);
+         const T sin_delta_by_delta = sin(norm_delta) / norm_delta;
+         q_delta[0] = cos(norm_delta);
+         q_delta[1] = sin_delta_by_delta * delta[0];
+         q_delta[2] = sin_delta_by_delta * delta[1];
+         q_delta[3] = sin_delta_by_delta * delta[2];
+       } else {
+         // We do not just use q_delta = [1,0,0,0] here because that is a
+         // constant and when used for automatic differentiation will
+         // lead to a zero derivative. Instead we take a first order
+         // approximation and evaluate it at zero.
+         q_delta[0] = T(1.0);
+         q_delta[1] = delta[0];
+         q_delta[2] = delta[1];
+         q_delta[3] = delta[2];
+       }
+
+       QuaternionProduct(q_delta, x, x_plus_delta);
+       return true;
+     }
+
+     template <typename T>
+     bool Minus(const T* y, const T* x, T* y_minus_x) const {
+       T minus_x[4] = {x[0], -x[1], -x[2], -x[3]};
+       T ambient_y_minus_x[4];
+       QuaternionProduct(y, minus_x, ambient_y_minus_x);
+       T u_norm = sqrt(ambient_y_minus_x[1] * ambient_y_minus_x[1] +
+		       ambient_y_minus_x[2] * ambient_y_minus_x[2] +
+		       ambient_y_minus_x[3] * ambient_y_minus_x[3]);
+       if (u_norm > 0.0) {
+	 T theta = atan2(u_norm, ambient_y_minus_x[0]);
+	 y_minus_x[0] = theta * ambient_y_minus_x[1] / u_norm;
+	 y_minus_x[1] = theta * ambient_y_minus_x[2] / u_norm;
+	 y_minus_x[2] = theta * ambient_y_minus_x[3] / u_norm;
+       } else {
+	 We do not use [0,0,0] here because even though the value part is
+	 a constant, the derivative part is not.
+	 y_minus_x[0] = ambient_y_minus_x[1];
+	 y_minus_x[1] = ambient_y_minus_x[2];
+	 y_minus_x[2] = ambient_y_minus_x[3];
+       }
+       return true;
+     }
+   };
+
+
+Then given this struct, the auto differentiated Quaternion Manifold can now
+be constructed as
+
+.. code-block:: c++
+
+   Manifold* manifold = new AutoDiffManifold<QuaternionFunctor, 4, 3>;
 
 :class:`Problem`
 ================
@@ -1581,10 +1872,10 @@
 
    :func:`Problem::AddParameterBlock` explicitly adds a parameter
    block to the :class:`Problem`. Optionally it allows the user to
-   associate a :class:`LocalParameterization` object with the
-   parameter block too. Repeated calls with the same arguments are
-   ignored. Repeated calls with the same double pointer but a
-   different size results in undefined behavior.
+   associate a :class:`Manifold` object with the parameter block
+   too. Repeated calls with the same arguments are ignored. Repeated
+   calls with the same double pointer but a different size results in
+   undefined behavior.
 
    You can set any parameter block to be constant using
    :func:`Problem::SetParameterBlockConstant` and undo this using
@@ -1604,17 +1895,16 @@
    **Ownership**
 
    :class:`Problem` by default takes ownership of the
-   ``cost_function``, ``loss_function`` and ``local_parameterization``
-   pointers. These objects remain live for the life of the
-   :class:`Problem`. If the user wishes to keep control over the
-   destruction of these objects, then they can do this by setting the
-   corresponding enums in the :class:`Problem::Options` struct.
+   ``cost_function``, ``loss_function`` and ``manifold`` pointers. These
+   objects remain live for the life of the :class:`Problem`. If the user wishes
+   to keep control over the destruction of these objects, then they can do this
+   by setting the corresponding enums in the :class:`Problem::Options` struct.
 
-   Note that even though the Problem takes ownership of ``cost_function``
-   and ``loss_function``, it does not preclude the user from re-using
-   them in another residual block. The destructor takes care to call
-   delete on each ``cost_function`` or ``loss_function`` pointer only
-   once, regardless of how many residual blocks refer to them.
+   Note that even though the Problem takes ownership of objects,
+   ``cost_function`` and ``loss_function``, it does not preclude the
+   user from re-using them in another residual block. Similarly the
+   same ``manifold`` object can be used with multiple parameter blocks. The
+   destructor takes care to call delete on each owned object exactly once.
 
 .. class:: Problem::Options
 
@@ -1627,7 +1917,7 @@
    This option controls whether the Problem object owns the cost
    functions.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
    cost functions on destruction. The destructor is careful to delete
    the pointers only once, since sharing cost functions is allowed.
 
@@ -1638,21 +1928,19 @@
    This option controls whether the Problem object owns the loss
    functions.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
    loss functions on destruction. The destructor is careful to delete
    the pointers only once, since sharing loss functions is allowed.
 
-.. member:: Ownership Problem::Options::local_parameterization_ownership
+.. member:: Ownership Problem::Options::manifold_ownership
 
    Default: ``TAKE_OWNERSHIP``
 
-   This option controls whether the Problem object owns the local
-   parameterizations.
+   This option controls whether the Problem object owns the manifolds.
 
-   If set to TAKE_OWNERSHIP, then the problem object will delete the
-   local parameterizations on destruction. The destructor is careful
-   to delete the pointers only once, since sharing local
-   parameterizations is allowed.
+   If set to ``TAKE_OWNERSHIP``, then the problem object will delete the
+   manifolds on destruction. The destructor is careful to delete the
+   pointers only once, since sharing manifolds is allowed.
 
 .. member:: bool Problem::Options::enable_fast_removal
 
@@ -1689,12 +1977,13 @@
     overhead you want to avoid, then you can set
     disable_all_safety_checks to true.
 
-    **WARNING** Do not set this to true, unless you are absolutely
-    sure of what you are doing.
+    .. warning::
+        Do not set this to true, unless you are absolutely sure of what you are
+        doing.
 
 .. member:: Context* Problem::Options::context
 
-    Default: `nullptr`
+    Default: ``nullptr``
 
     A Ceres global context to use for solving this problem. This may
     help to reduce computation time as Ceres can reuse expensive
@@ -1705,7 +1994,7 @@
 
 .. member:: EvaluationCallback* Problem::Options::evaluation_callback
 
-    Default: `nullptr`
+    Default: ``nullptr``
 
     Using this callback interface, Ceres will notify you when it is
     about to evaluate the residuals or Jacobians.
@@ -1725,11 +2014,11 @@
 
        Evaluation callbacks are incompatible with inner iterations. So
        calling Solve with
-       :member:`Solver::Options::use_inner_iterations` set to `true`
+       :member:`Solver::Options::use_inner_iterations` set to ``true``
        on a :class:`Problem` with a non-null evaluation callback is an
        error.
 
-.. function:: ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, const vector<double*> parameter_blocks)
+.. function:: ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, const std::vector<double*> parameter_blocks)
 
 .. function:: template <typename Ts...> ResidualBlockId Problem::AddResidualBlock(CostFunction* cost_function, LossFunction* loss_function, double* x0, Ts... xs)
 
@@ -1738,7 +2027,7 @@
    parameter blocks it expects. The function checks that these match
    the sizes of the parameter blocks listed in parameter_blocks. The
    program aborts if a mismatch is detected. loss_function can be
-   `nullptr`, in which case the cost of the term is just the squared
+   ``nullptr``, in which case the cost of the term is just the squared
    norm of the residuals.
 
    The parameter blocks may be passed together as a
@@ -1756,11 +2045,12 @@
    keep control over the destruction of these objects, then they can
    do this by setting the corresponding enums in the Options struct.
 
-   Note: Even though the Problem takes ownership of cost_function
-   and loss_function, it does not preclude the user from re-using
-   them in another residual block. The destructor takes care to call
-   delete on each cost_function or loss_function pointer only once,
-   regardless of how many residual blocks refer to them.
+   .. note::
+       Even though the Problem takes ownership of ``cost_function``
+       and ``loss_function``, it does not preclude the user from re-using
+       them in another residual block. The destructor takes care to call
+       delete on each cost_function or loss_function pointer only once,
+       regardless of how many residual blocks refer to them.
 
    Example usage:
 
@@ -1769,9 +2059,9 @@
       double x1[] = {1.0, 2.0, 3.0};
       double x2[] = {1.0, 2.0, 5.0, 6.0};
       double x3[] = {3.0, 6.0, 2.0, 5.0, 1.0};
-      vector<double*> v1;
+      std::vector<double*> v1;
       v1.push_back(x1);
-      vector<double*> v2;
+      std::vector<double*> v2;
       v2.push_back(x2);
       v2.push_back(x1);
 
@@ -1782,12 +2072,19 @@
       problem.AddResidualBlock(new MyUnaryCostFunction(...), nullptr, v1);
       problem.AddResidualBlock(new MyBinaryCostFunction(...), nullptr, v2);
 
-.. function:: void Problem::AddParameterBlock(double* values, int size, LocalParameterization* local_parameterization)
+.. function:: void Problem::AddParameterBlock(double* values, int size, Manifold* manifold)
 
-   Add a parameter block with appropriate size to the problem.
+   Add a parameter block with appropriate size and Manifold to the
+   problem. It is okay for ``manifold`` to be ``nullptr``.
+
    Repeated calls with the same arguments are ignored. Repeated calls
-   with the same double pointer but a different size results in
-   undefined behavior.
+   with the same double pointer but a different size results in a crash
+   (unless :member:`Solver::Options::disable_all_safety_checks` is set to true).
+
+   Repeated calls with the same double pointer and size but different
+   :class:`Manifold` is equivalent to calling `SetManifold(manifold)`,
+   i.e., any previously associated :class:`Manifold` object will be replaced
+   with the `manifold`.
 
 .. function:: void Problem::AddParameterBlock(double* values, int size)
 
@@ -1798,35 +2095,46 @@
 
 .. function:: void Problem::RemoveResidualBlock(ResidualBlockId residual_block)
 
-   Remove a residual block from the problem. Any parameters that the residual
-   block depends on are not removed. The cost and loss functions for the
-   residual block will not get deleted immediately; won't happen until the
-   problem itself is deleted.  If Problem::Options::enable_fast_removal is
-   true, then the removal is fast (almost constant time). Otherwise, removing a
-   residual block will incur a scan of the entire Problem object to verify that
-   the residual_block represents a valid residual in the problem.
+   Remove a residual block from the problem.
 
-   **WARNING:** Removing a residual or parameter block will destroy
-   the implicit ordering, rendering the jacobian or residuals returned
-   from the solver uninterpretable. If you depend on the evaluated
-   jacobian, do not use remove! This may change in a future release.
-   Hold the indicated parameter block constant during optimization.
+   Since residual blocks are allowed to share cost function and loss
+   function objects, Ceres Solver uses a reference counting
+   mechanism. So when a residual block is deleted, the reference count
+   for the corresponding cost function and loss function objects are
+   decreased and when this count reaches zero, they are deleted.
+
+   If :member:`Problem::Options::enable_fast_removal` is ``true``, then the removal
+   is fast (almost constant time). Otherwise it is linear, requiring a
+   scan of the entire problem.
+
+   Removing a residual block has no effect on the parameter blocks
+   that the problem depends on.
+
+   .. warning::
+       Removing a residual or parameter block will destroy the implicit
+       ordering, rendering the jacobian or residuals returned from the solver
+       uninterpretable. If you depend on the evaluated jacobian, do not use
+       remove! This may change in a future release. Hold the indicated parameter
+       block constant during optimization.
 
 .. function:: void Problem::RemoveParameterBlock(const double* values)
 
-   Remove a parameter block from the problem. The parameterization of
-   the parameter block, if it exists, will persist until the deletion
-   of the problem (similar to cost/loss functions in residual block
-   removal). Any residual blocks that depend on the parameter are also
-   removed, as described above in RemoveResidualBlock().  If
-   Problem::Options::enable_fast_removal is true, then
-   the removal is fast (almost constant time). Otherwise, removing a
-   parameter block will incur a scan of the entire Problem object.
+   Remove a parameter block from the problem. Any residual blocks that
+   depend on the parameter are also removed, as described above in
+   :func:`RemoveResidualBlock()`.
 
-   **WARNING:** Removing a residual or parameter block will destroy
-   the implicit ordering, rendering the jacobian or residuals returned
-   from the solver uninterpretable. If you depend on the evaluated
-   jacobian, do not use remove! This may change in a future release.
+   The manifold of the parameter block, if it exists, will persist until the
+   deletion of the problem.
+
+   If :member:`Problem::Options::enable_fast_removal` is ``true``, then the removal
+   is fast (almost constant time). Otherwise, removing a parameter
+   block will scan the entire Problem.
+
+   .. warning::
+       Removing a residual or parameter block will destroy the implicit
+       ordering, rendering the jacobian or residuals returned from the solver
+       uninterpretable. If you depend on the evaluated jacobian, do not use
+       remove! This may change in a future release.
 
 .. function:: void Problem::SetParameterBlockConstant(const double* values)
 
@@ -1841,23 +2149,34 @@
    Returns ``true`` if a parameter block is set constant, and false
    otherwise. A parameter block may be set constant in two ways:
    either by calling ``SetParameterBlockConstant`` or by associating a
-   ``LocalParameterization`` with a zero dimensional tangent space
-   with it.
+   :class:`Manifold` with a zero dimensional tangent space with it.
 
-.. function:: void Problem::SetParameterization(double* values, LocalParameterization* local_parameterization)
+.. function:: void SetManifold(double* values, Manifold* manifold);
 
-   Set the local parameterization for one of the parameter blocks.
-   The local_parameterization is owned by the Problem by default. It
-   is acceptable to set the same parameterization for multiple
-   parameters; the destructor is careful to delete local
-   parameterizations only once. Calling `SetParameterization` with
-   `nullptr` will clear any previously set parameterization.
+   Set the :class:`Manifold` for the parameter block. Calling
+   :func:`Problem::SetManifold` with ``nullptr`` will clear any
+   previously set :class:`Manifold` for the parameter block.
 
-.. function:: LocalParameterization* Problem::GetParameterization(const double* values) const
+   Repeated calls will result in any previously associated
+   :class:`Manifold` object to be replaced with ``manifold``.
 
-   Get the local parameterization object associated with this
-   parameter block. If there is no parameterization object associated
-   then `nullptr` is returned
+   ``manifold`` is owned by :class:`Problem` by default (See
+   :class:`Problem::Options` to override this behaviour).
+
+   It is acceptable to set the same :class:`Manifold` for multiple
+   parameter blocks.
+
+.. function:: const Manifold* GetManifold(const double* values) const;
+
+   Get the :class:`Manifold` object associated with this parameter block.
+
+   If there is no :class:`Manifold` object associated with the parameter block,
+   then ``nullptr`` is returned.
+
+.. function:: bool HasManifold(const double* values) const;
+
+   Returns ``true`` if a :class:`Manifold` is associated with this parameter
+   block, ``false`` otherwise.
 
 .. function:: void Problem::SetParameterLowerBound(double* values, int index, double lower_bound)
 
@@ -1909,43 +2228,42 @@
 
    The size of the parameter block.
 
-.. function:: int Problem::ParameterBlockLocalSize(const double* values) const
+.. function:: int Problem::ParameterBlockTangentSize(const double* values) const
 
-   The size of local parameterization for the parameter block. If
-   there is no local parameterization associated with this parameter
-   block, then ``ParameterBlockLocalSize`` = ``ParameterBlockSize``.
+   The dimension of the tangent space of the :class:`Manifold` for the
+   parameter block. If there is no :class:`Manifold` associated with this
+   parameter block, then ``ParameterBlockTangentSize = ParameterBlockSize``.
 
 .. function:: bool Problem::HasParameterBlock(const double* values) const
 
    Is the given parameter block present in the problem or not?
 
-.. function:: void Problem::GetParameterBlocks(vector<double*>* parameter_blocks) const
+.. function:: void Problem::GetParameterBlocks(std::vector<double*>* parameter_blocks) const
 
    Fills the passed ``parameter_blocks`` vector with pointers to the
    parameter blocks currently in the problem. After this call,
    ``parameter_block.size() == NumParameterBlocks``.
 
-.. function:: void Problem::GetResidualBlocks(vector<ResidualBlockId>* residual_blocks) const
+.. function:: void Problem::GetResidualBlocks(std::vector<ResidualBlockId>* residual_blocks) const
 
    Fills the passed `residual_blocks` vector with pointers to the
    residual blocks currently in the problem. After this call,
    `residual_blocks.size() == NumResidualBlocks`.
 
-.. function:: void Problem::GetParameterBlocksForResidualBlock(const ResidualBlockId residual_block, vector<double*>* parameter_blocks) const
+.. function:: void Problem::GetParameterBlocksForResidualBlock(const ResidualBlockId residual_block, std::vector<double*>* parameter_blocks) const
 
    Get all the parameter blocks that depend on the given residual
    block.
 
-.. function:: void Problem::GetResidualBlocksForParameterBlock(const double* values, vector<ResidualBlockId>* residual_blocks) const
+.. function:: void Problem::GetResidualBlocksForParameterBlock(const double* values, std::vector<ResidualBlockId>* residual_blocks) const
 
    Get all the residual blocks that depend on the given parameter
    block.
 
-   If `Problem::Options::enable_fast_removal` is
-   `true`, then getting the residual blocks is fast and depends only
+   If :member:`Problem::Options::enable_fast_removal` is
+   ``true``, then getting the residual blocks is fast and depends only
    on the number of residual blocks. Otherwise, getting the residual
-   blocks for a parameter block will incur a scan of the entire
-   :class:`Problem` object.
+   blocks for a parameter block will scan the entire problem.
 
 .. function:: const CostFunction* Problem::GetCostFunctionForResidualBlock(const ResidualBlockId residual_block) const
 
@@ -1974,11 +2292,10 @@
    function returns false, the caller should expect the output
    memory locations to have been modified.
 
-   The returned cost and jacobians have had robustification and local
-   parameterizations applied already; for example, the jacobian for a
-   4-dimensional quaternion parameter using the
-   :class:`QuaternionParameterization` is ``num_residuals x 3``
-   instead of ``num_residuals x 4``.
+   The returned cost and jacobians have had robustification and
+   :class:`Manifold` applied already; for example, the jacobian for a
+   4-dimensional quaternion parameter using the :class:`QuaternionManifold` is
+   ``num_residuals x 3`` instead of ``num_residuals x 4``.
 
    ``apply_loss_function`` as the name implies allows the user to
    switch the application of the loss function on and off.
@@ -2014,10 +2331,10 @@
     :func:`Problem::EvaluateResidualBlock`).
 
 
-.. function:: bool Problem::Evaluate(const Problem::EvaluateOptions& options, double* cost, vector<double>* residuals, vector<double>* gradient, CRSMatrix* jacobian)
+.. function:: bool Problem::Evaluate(const Problem::EvaluateOptions& options, double* cost, std::vector<double>* residuals, std::vector<double>* gradient, CRSMatrix* jacobian)
 
    Evaluate a :class:`Problem`. Any of the output pointers can be
-   `nullptr`. Which residual blocks and parameter blocks are used is
+   ``nullptr``. Which residual blocks and parameter blocks are used is
    controlled by the :class:`Problem::EvaluateOptions` struct below.
 
    .. NOTE::
@@ -2048,10 +2365,10 @@
 
    .. NOTE::
 
-      If no local parameterizations are used, then the size of
-      the gradient vector is the sum of the sizes of all the parameter
-      blocks. If a parameter block has a local parameterization, then
-      it contributes "LocalSize" entries to the gradient vector.
+      If no :class:`Manifold` are used, then the size of the gradient vector is
+      the sum of the sizes of all the parameter blocks. If a parameter block has
+      a manifold then it contributes "TangentSize" entries to the gradient
+      vector.
 
    .. NOTE::
 
@@ -2070,7 +2387,7 @@
 
    Options struct that is used to control :func:`Problem::Evaluate`.
 
-.. member:: vector<double*> Problem::EvaluateOptions::parameter_blocks
+.. member:: std::vector<double*> Problem::EvaluateOptions::parameter_blocks
 
    The set of parameter blocks for which evaluation should be
    performed. This vector determines the order in which parameter
@@ -2086,7 +2403,7 @@
    should NOT point to new memory locations. Bad things will happen if
    you do.
 
-.. member:: vector<ResidualBlockId> Problem::EvaluateOptions::residual_blocks
+.. member:: std::vector<ResidualBlockId> Problem::EvaluateOptions::residual_blocks
 
    The set of residual blocks for which evaluation should be
    performed. This vector determines the order in which the residuals
@@ -2105,7 +2422,7 @@
 
 .. member:: int Problem::EvaluateOptions::num_threads
 
-   Number of threads to use. (Requires OpenMP).
+   Number of threads to use.
 
 
 :class:`EvaluationCallback`
@@ -2120,9 +2437,9 @@
 
       class EvaluationCallback {
        public:
-        virtual ~EvaluationCallback() {}
-        virtual void PrepareForEvaluation()(bool evaluate_jacobians
-                                            bool new_evaluation_point) = 0;
+        virtual ~EvaluationCallback();
+        virtual void PrepareForEvaluation(bool evaluate_jacobians,
+                                          bool new_evaluation_point) = 0;
       };
 
 .. function:: void EvaluationCallback::PrepareForEvaluation(bool evaluate_jacobians, bool new_evaluation_point)
@@ -2131,14 +2448,14 @@
    every time, and once before it computes the residuals and/or the
    Jacobians.
 
-   User parameters (the double* values provided by the us) are fixed
+   User parameters (the double* values provided by the user) are fixed
    until the next call to
    :func:`EvaluationCallback::PrepareForEvaluation`. If
    ``new_evaluation_point == true``, then this is a new point that is
    different from the last evaluated point. Otherwise, it is the same
    point that was evaluated previously (either Jacobian or residual)
    and the user can use cached results from previous evaluations. If
-   ``evaluate_jacobians`` is true, then Ceres will request Jacobians
+   ``evaluate_jacobians`` is ``true``, then Ceres will request Jacobians
    in the upcoming cost evaluation.
 
    Using this callback interface, Ceres can notify you when it is
@@ -2165,7 +2482,10 @@
    to use a global shared variable (discouraged; bug-prone).  As far
    as Ceres is concerned, it is evaluating cost functions like any
    other; it just so happens that behind the scenes the cost functions
-   reuse pre-computed data to execute faster.
+   reuse pre-computed data to execute faster. See
+   `examples/evaluation_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/evaluation_callback_example.cc>`_
+   for an example.
 
    See ``evaluation_callback_test.cc`` for code that explicitly
    verifies the preconditions between
@@ -2207,14 +2527,14 @@
 .. function:: template <typename T> void RotationMatrixToAngleAxis(T const * R, T * angle_axis)
 .. function:: template <typename T> void AngleAxisToRotationMatrix(T const * angle_axis, T * R)
 
-   Conversions between 3x3 rotation matrix with given column and row strides and
+   Conversions between :math:`3\times3` rotation matrix with given column and row strides and
    axis-angle rotation representations. The functions that take a pointer to T instead
    of a MatrixAdapter assume a column major representation with unit row stride and a column stride of 3.
 
 .. function:: template <typename T, int row_stride, int col_stride> void EulerAnglesToRotationMatrix(const T* euler, const MatrixAdapter<T, row_stride, col_stride>& R)
 .. function:: template <typename T> void EulerAnglesToRotationMatrix(const T* euler, int row_stride, T* R)
 
-   Conversions between 3x3 rotation matrix with given column and row strides and
+   Conversions between :math:`3\times3` rotation matrix with given column and row strides and
    Euler angle (in degrees) rotation representations.
 
    The {pitch,roll,yaw} Euler angles are rotations around the {x,y,z}
@@ -2228,7 +2548,7 @@
 .. function:: template <typename T, int row_stride, int col_stride> void QuaternionToScaledRotation(const T q[4], const MatrixAdapter<T, row_stride, col_stride>& R)
 .. function:: template <typename T> void QuaternionToScaledRotation(const T q[4], T R[3 * 3])
 
-   Convert a 4-vector to a 3x3 scaled rotation matrix.
+   Convert a 4-vector to a :math:`3\times3` scaled rotation matrix.
 
    The choice of rotation is such that the quaternion
    :math:`\begin{bmatrix} 1 &0 &0 &0\end{bmatrix}` goes to an identity
@@ -2242,8 +2562,8 @@
 
    which corresponds to a Rodrigues approximation, the last matrix
    being the cross-product matrix of :math:`\begin{bmatrix} a& b&
-   c\end{bmatrix}`. Together with the property that :math:`R(q1 * q2)
-   = R(q1) * R(q2)` this uniquely defines the mapping from :math:`q` to
+   c\end{bmatrix}`. Together with the property that :math:`R(q_1 \otimes q_2)
+   = R(q_1) R(q_2)` this uniquely defines the mapping from :math:`q` to
    :math:`R`.
 
    In the function that accepts a pointer to T instead of a MatrixAdapter,
@@ -2252,7 +2572,7 @@
 
    No normalization of the quaternion is performed, i.e.
    :math:`R = \|q\|^2  Q`, where :math:`Q` is an orthonormal matrix
-   such that :math:`\det(Q) = 1` and :math:`Q*Q' = I`.
+   such that :math:`\det(Q) = 1` and :math:`QQ' = I`.
 
 
 .. function:: template <typename T> void QuaternionToRotation(const T q[4], const MatrixAdapter<T, row_stride, col_stride>& R)
@@ -2279,9 +2599,9 @@
 
 .. function:: template <typename T> void QuaternionProduct(const T z[4], const T w[4], T zw[4])
 
-   .. math:: zw = z * w
+   .. math:: zw = z \otimes w
 
-   where :math:`*` is the Quaternion product between 4-vectors.
+   where :math:`\otimes` is the Quaternion product between 4-vectors.
 
 
 .. function:: template <typename T> void CrossProduct(const T x[3], const T y[3], T x_cross_y[3])
diff --git a/docs/source/nnls_solving.rst b/docs/source/nnls_solving.rst
index 285df3a..184b94b 100644
--- a/docs/source/nnls_solving.rst
+++ b/docs/source/nnls_solving.rst
@@ -1,3 +1,4 @@
+.. highlight:: c++
 
 .. default-domain:: cpp
 
@@ -12,10 +13,10 @@
 Introduction
 ============
 
-Effective use of Ceres requires some familiarity with the basic
+Effective use of Ceres Solver requires some familiarity with the basic
 components of a non-linear least squares solver, so before we describe
 how to configure and use the solver, we will take a brief look at how
-some of the core optimization algorithms in Ceres work.
+some of the core optimization algorithms in Ceres Solver work.
 
 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of
 variables, and
@@ -27,15 +28,15 @@
           L \le x \le U
   :label: nonlinsq
 
-Where, :math:`L` and :math:`U` are lower and upper bounds on the
-parameter vector :math:`x`.
+Where, :math:`L` and :math:`U` are vector lower and upper bounds on
+the parameter vector :math:`x`. The inequality holds component-wise.
 
 Since the efficient global minimization of :eq:`nonlinsq` for
 general :math:`F(x)` is an intractable problem, we will have to settle
 for finding a local minimum.
 
 In the following, the Jacobian :math:`J(x)` of :math:`F(x)` is an
-:math:`m\times n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)`
+:math:`m\times n` matrix, where :math:`J_{ij}(x) = D_j f_i(x)`
 and the gradient vector is :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2
 = J(x)^\top F(x)`.
 
@@ -75,7 +76,7 @@
 Trust region methods are in some sense dual to line search methods:
 trust region methods first choose a step size (the size of the trust
 region) and then a step direction while line search methods first
-choose a step direction and then a step size. Ceres implements
+choose a step direction and then a step size. Ceres Solver implements
 multiple algorithms in both categories.
 
 .. _section-trust-region-methods:
@@ -152,8 +153,9 @@
 solving an unconstrained optimization of the form
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda  \|D(x)\Delta x\|^2
+   :label: lsqr-naive
 
-Where, :math:`\lambda` is a Lagrange multiplier that is inverse
+Where, :math:`\lambda` is a Lagrange multiplier that is inversely
 related to :math:`\mu`. In Ceres, we solve for
 
 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2
@@ -162,39 +164,47 @@
 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically
 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`.
 
-Before going further, let us make some notational simplifications. We
-will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been concatenated
-at the bottom of the matrix :math:`J` and similarly a vector of zeros
-has been added to the bottom of the vector :math:`f` and the rest of
-our discussion will be in terms of :math:`J` and :math:`F`, i.e, the
-linear least squares problem.
+Before going further, let us make some notational simplifications.
+
+We will assume that the matrix :math:`\frac{1}{\sqrt{\mu}} D` has been
+concatenated at the bottom of the matrix :math:`J(x)` and a
+corresponding vector of zeroes has been added to the bottom of
+:math:`F(x)`, i.e.:
+
+.. math:: J(x) = \begin{bmatrix} J(x) \\ \frac{1}{\sqrt{\mu}} D
+          \end{bmatrix},\quad F(x) = \begin{bmatrix} F(x) \\ 0
+          \end{bmatrix}.
+
+This allows us to re-write :eq:`lsqr` as
 
 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple
 
-For all but the smallest problems the solution of :eq:`simple` in
-each iteration of the Levenberg-Marquardt algorithm is the dominant
-computational cost in Ceres. Ceres provides a number of different
-options for solving :eq:`simple`. There are two major classes of
-methods - factorization and iterative.
+and only talk about :math:`J(x)` and :math:`F(x)` going forward.
+
+For all but the smallest problems the solution of :eq:`simple` in each
+iteration of the Levenberg-Marquardt algorithm is the dominant
+computational cost. Ceres provides a number of different options for
+solving :eq:`simple`. There are two major classes of methods -
+factorization and iterative.
 
 The factorization methods are based on computing an exact solution of
-:eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact
-step Levenberg-Marquardt algorithm. But it is not clear if an exact
-solution of :eq:`lsqr` is necessary at each step of the LM algorithm
-to solve :eq:`nonlinsq`. In fact, we have already seen evidence
-that this may not be the case, as :eq:`lsqr` is itself a regularized
-version of :eq:`linearapprox`. Indeed, it is possible to
-construct non-linear optimization algorithms in which the linearized
-problem is solved approximately. These algorithms are known as inexact
-Newton or truncated Newton methods [NocedalWright]_.
+:eq:`lsqr` using a Cholesky or a QR factorization and lead to the so
+called exact step Levenberg-Marquardt algorithm. But it is not clear
+if an exact solution of :eq:`lsqr` is necessary at each step of the
+Levenberg-Mardquardt algorithm.  We have already seen evidence that
+this may not be the case, as :eq:`lsqr` is itself a regularized
+version of :eq:`linearapprox`. Indeed, it is possible to construct
+non-linear optimization algorithms in which the linearized problem is
+solved approximately. These algorithms are known as inexact Newton or
+truncated Newton methods [NocedalWright]_.
 
 An inexact Newton method requires two ingredients. First, a cheap
 method for approximately solving systems of linear
 equations. Typically an iterative linear solver like the Conjugate
-Gradients method is used for this
-purpose [NocedalWright]_. Second, a termination rule for
-the iterative solver. A typical termination rule is of the form
+Gradients method is used for this purpose [NocedalWright]_. Second, a
+termination rule for the iterative solver. A typical termination rule
+is of the form
 
 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|.
    :label: inexact
@@ -212,14 +222,18 @@
 iterative linear solver, the inexact step Levenberg-Marquardt
 algorithm is used.
 
+We will talk more about the various linear solvers that you can use in
+:ref:`section-linear-solver`.
+
 .. _section-dogleg:
 
 Dogleg
 ------
 
 Another strategy for solving the trust region problem :eq:`trp` was
-introduced by M. J. D. Powell. The key idea there is to compute two
-vectors
+introduced by
+`M. J. D. Powell <https://en.wikipedia.org/wiki/Michael_J._D._Powell>`_. The
+key idea there is to compute two vectors
 
 .. math::
 
@@ -253,10 +267,14 @@
 Levenberg-Marquardt solves the linear approximation from scratch with
 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs
 to compute the interpolation between the Gauss-Newton and the Cauchy
-vectors, as neither of them depend on the value of :math:`\mu`.
+vectors, as neither of them depend on the value of :math:`\mu`. As a
+result the Dogleg method only solves one linear system per successful
+step, while Levenberg-Marquardt may need to solve an arbitrary number
+of linear systems before it can make progress [LourakisArgyros]_.
 
-The Dogleg method can only be used with the exact factorization based
-linear solvers.
+A disadvantage of the Dogleg implementation in Ceres Solver is that is
+can only be used with method can only be used with exact factorization
+based linear solvers.
 
 .. _section-inner-iterations:
 
@@ -349,10 +367,10 @@
 enables the non-monotonic trust region algorithm as described by Conn,
 Gould & Toint in [Conn]_.
 
-Even though the value of the objective function may be larger
-than the minimum value encountered over the course of the
-optimization, the final parameters returned to the user are the
-ones corresponding to the minimum cost over all iterations.
+Even though the value of the objective function may be larger than the
+minimum value encountered over the course of the optimization, the
+final parameters returned to the user are the ones corresponding to
+the minimum cost over all iterations.
 
 The option to take non-monotonic steps is available for all trust
 region strategies.
@@ -363,11 +381,13 @@
 Line Search Methods
 ===================
 
-The line search method in Ceres Solver cannot handle bounds
-constraints right now, so it can only be used for solving
-unconstrained problems.
+.. NOTE::
 
-Line search algorithms
+   The line search method in Ceres Solver cannot handle bounds
+   constraints right now, so it can only be used for solving
+   unconstrained problems.
+
+The basic line search algorithm looks something like this:
 
    1. Given an initial point :math:`x`
    2. :math:`\Delta x = -H^{-1}(x) g(x)`
@@ -387,7 +407,7 @@
 direction :math:`\Delta x` and the method used for one dimensional
 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the
 primary source of computational complexity in these
-methods. Currently, Ceres Solver supports three choices of search
+methods. Currently, Ceres Solver supports four choices of search
 directions, all aimed at large scale problems.
 
 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to
@@ -405,8 +425,8 @@
 3. ``BFGS`` A generalization of the Secant method to multiple
    dimensions in which a full, dense approximation to the inverse
    Hessian is maintained and used to compute a quasi-Newton step
-   [NocedalWright]_.  BFGS is currently the best known general
-   quasi-Newton algorithm.
+   [NocedalWright]_.  ``BFGS`` and its limited memory variant ``LBFGS``
+   are currently the best known general quasi-Newton algorithm.
 
 4. ``LBFGS`` A limited memory approximation to the full ``BFGS``
    method in which the last `M` iterations are used to approximate the
@@ -414,26 +434,31 @@
    [ByrdNocedal]_.
 
 Currently Ceres Solver supports both a backtracking and interpolation
-based Armijo line search algorithm, and a sectioning / zoom
-interpolation (strong) Wolfe condition line search algorithm.
-However, note that in order for the assumptions underlying the
-``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the
-Wolfe line search algorithm should be used.
+based `Armijo line search algorithm
+<https://en.wikipedia.org/wiki/Backtracking_line_search>`_ (``ARMIJO``)
+, and a sectioning / zoom interpolation (strong) `Wolfe condition line
+search algorithm <https://en.wikipedia.org/wiki/Wolfe_conditions>`_
+(``WOLFE``).
+
+.. NOTE::
+
+   In order for the assumptions underlying the ``BFGS`` and ``LBFGS``
+   methods to be satisfied the ``WOLFE`` algorithm must be used.
 
 .. _section-linear-solver:
 
-LinearSolver
-============
+Linear Solvers
+==============
 
-Recall that in both of the trust-region methods described above, the
+Observe that for both of the trust-region methods described above, the
 key computational cost is the solution of a linear least squares
 problem of the form
 
-.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 .
+.. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + F(x)\|^2 .
    :label: simple2
 
 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top
-f(x)`. For notational convenience let us also drop the dependence on
+F(x)`. For notational convenience let us also drop the dependence on
 :math:`x`. Then it is easy to see that solving :eq:`simple2` is
 equivalent to solving the *normal equations*.
 
@@ -444,31 +469,65 @@
 
 .. _section-qr:
 
-``DENSE_QR``
-------------
+DENSE_QR
+--------
 
 For small problems (a couple of hundred parameters and a few thousand
-residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method
-of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of
-:math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is
-an upper triangular matrix [TrefethenBau]_. Then it can be shown that
-the solution to :eq:`normal` is given by
+residuals) with relatively dense Jacobians, QR-decomposition is the
+method of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition
+of :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R`
+is an upper triangular matrix [TrefethenBau]_. Then it can be shown
+that the solution to :eq:`normal` is given by
 
 .. math:: \Delta x^* = -R^{-1}Q^\top f
 
+You can use QR-decomposition by setting
+:member:`Solver::Options::linear_solver_type` to ``DENSE_QR``.
 
-Ceres uses ``Eigen`` 's dense QR factorization routines.
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen Householder QR factorization
+<https://eigen.tuxfamily.org/dox-devel/classEigen_1_1HouseholderQR.html>`_
+.
 
-.. _section-cholesky:
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DGEQRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Householder
+QR factorization.
 
-``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY``
-------------------------------------------------------
+.. _DGEQRF: https://netlib.org/lapack/explore-html/df/dc5/group__variants_g_ecomputational_ga3766ea903391b5cf9008132f7440ec7b.html
 
-Large non-linear least square problems are usually sparse. In such
-cases, using a dense QR factorization is inefficient. Let :math:`H =
-R^\top R` be the Cholesky factorization of the normal equations, where
-:math:`R` is an upper triangular matrix, then the solution to
-:eq:`normal` is given by
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+QR-decomposition on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial
+speedup. Using CUDA only makes sense for moderate to large sized
+problems. This is because to perform the decomposition on the GPU the
+matrix :math:`J` needs to be transferred from the CPU to the GPU and
+this incurs a cost. So unless the speedup from doing the decomposition
+on the GPU is large enough to also account for the time taken to
+transfer the Jacobian to the GPU, using CUDA will not be better than
+just doing the decomposition on the CPU.
+
+.. _section-dense-normal-cholesky:
+
+DENSE_NORMAL_CHOLESKY
+---------------------
+
+It is often the case that the number of rows in the Jacobian :math:`J`
+are much larger than the the number of columns. The complexity of QR
+factorization scales linearly with the number of rows, so beyond a
+certain size it is more efficient to solve :eq:`normal` using a dense
+`Cholesky factorization
+<https://en.wikipedia.org/wiki/Cholesky_decomposition>`_.
+
+Let :math:`H = R^\top R` be the Cholesky factorization of the normal
+equations, where :math:`R` is an upper triangular matrix, then the
+solution to :eq:`normal` is given by
 
 .. math::
 
@@ -479,69 +538,155 @@
 factorization of :math:`H` is the same upper triangular matrix
 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an
 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top
-Q^\top Q R = R^\top R`. There are two variants of Cholesky
-factorization -- sparse and dense.
+Q^\top Q R = R^\top R`.
 
-``DENSE_NORMAL_CHOLESKY``  as the name implies performs a dense
-Cholesky factorization of the normal equations. Ceres uses
-``Eigen`` 's dense LDLT factorization routines.
+Unfortunately, forming the matrix :math:`H = J'J` squares the
+condition number. As a result while the cost of forming :math:`H` and
+computing its Cholesky factorization is lower than computing the
+QR-factorization of :math:`J`, we pay the price in terms of increased
+numerical instability and potential failure of the Cholesky
+factorization for ill-conditioned Jacobians.
 
-``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse
-Cholesky factorization of the normal equations. This leads to
-substantial savings in time and memory for large sparse
-problems. Ceres uses the sparse Cholesky factorization routines in
-Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_
-or the sparse Cholesky factorization algorithm in ``Eigen`` (which
-incidently is a port of the algorithm implemented inside ``CXSparse``)
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``DENSE_NORMAL_CHOLESKY``.
+
+By default (``Solver::Options::dense_linear_algebra_library_type =
+EIGEN``) Ceres Solver will use `Eigen's LLT factorization`_ routine.
+
+.. _Eigen's LLT Factorization:  https://eigen.tuxfamily.org/dox/classEigen_1_1LLT.html
+
+If Ceres Solver has been built with an optimized LAPACK
+implementation, then the user can also choose to use LAPACK's
+`DPOTRF`_ routine by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``LAPACK``. Depending on the `LAPACK` and the underlying `BLAS`
+implementation this may perform better than using Eigen's Cholesky
+factorization.
+
+.. _DPOTRF: https://www.netlib.org/lapack/explore-html/d1/d7a/group__double_p_ocomputational_ga2f55f604a6003d03b5cd4a0adcfb74d6.html
+
+If an NVIDIA GPU is available and Ceres Solver has been built with
+CUDA support enabled, then the user can also choose to perform the
+Cholesky factorization on the GPU by setting
+:member:`Solver::Options::dense_linear_algebra_library_type` to
+``CUDA``. Depending on the GPU this can lead to a substantial speedup.
+Using CUDA only makes sense for moderate to large sized problems. This
+is because to perform the decomposition on the GPU the matrix
+:math:`H` needs to be transferred from the CPU to the GPU and this
+incurs a cost. So unless the speedup from doing the decomposition on
+the GPU is large enough to also account for the time taken to transfer
+the Jacobian to the GPU, using CUDA will not be better than just doing
+the decomposition on the CPU.
+
+
+.. _section-sparse-normal-cholesky:
+
+SPARSE_NORMAL_CHOLESKY
+----------------------
+
+Large non-linear least square problems are usually sparse. In such
+cases, using a dense QR or Cholesky factorization is inefficient. For
+such problems, Cholesky factorization routines which treat :math:`H`
+as a sparse matrix and computes a sparse factor :math:`R` are better
+suited [Davis]_. This can lead to substantial savings in memory and
+CPU time for large sparse problems.
+
+You can use dense Cholesky factorization by setting
+:member:`Solver::Options::linear_solver_type` to
+``SPARSE_NORMAL_CHOLESKY``.
+
+The use of this linear solver requires that Ceres is compiled with
+support for at least one of:
+
+ 1. `SuiteSparse <https://people.engr.tamu.edu/davis/suitesparse.html>`_ (``SUITE_SPARSE``).
+ 2. `Apple's Accelerate framework
+    <https://developer.apple.com/documentation/accelerate/sparse_solvers?language=objc>`_
+    (``ACCELERATE_SPARSE``).
+ 3. `Eigen's sparse linear solvers
+    <https://eigen.tuxfamily.org/dox/group__SparseCholesky__Module.html>`_
+    (``EIGEN_SPARSE``).
+
+SuiteSparse and Accelerate offer high performance sparse Cholesky
+factorization routines as they level-3 BLAS routines
+internally. Eigen's sparse Cholesky routines are *simplicial* and do
+not use dense linear algebra routines and as a result cannot compete
+with SuiteSparse and Accelerate, especially on large problems. As a
+result to get the best performance out of SuiteSparse it should be
+linked to high quality BLAS and LAPACK implementations e.g. `ATLAS
+<https://math-atlas.sourceforge.net/>`_, `OpenBLAS
+<https://www.openblas.net/>`_ or `Intel MKL
+<https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html>`_.
+
+A critical part of a sparse Cholesky factorization routine is the use
+a fill-reducing ordering. By default Ceres Solver uses the Approximate
+Minimum Degree (``AMD``) ordering, which usually performs well, but
+there are other options that may perform better depending on the
+actual sparsity structure of the Jacobian. See :ref:`section-ordering`
+for more details.
 
 .. _section-cgnr:
 
-``CGNR``
---------
+CGNR
+----
 
-For general sparse problems, if the problem is too large for
-``CHOLMOD`` or a sparse linear algebra library is not linked into
-Ceres, another option is the ``CGNR`` solver. This solver uses the
-Conjugate Gradients solver on the *normal equations*, but without
-forming the normal equations explicitly. It exploits the relation
+For general sparse problems, if the problem is too large for sparse
+Cholesky factorization or a sparse linear algebra library is not
+linked into Ceres, another option is the ``CGNR`` solver. This solver
+uses the `Conjugate Gradients
+<https://en.wikipedia.org/wiki/Conjugate_gradient_method>_` method on
+the *normal equations*, but without forming the normal equations
+explicitly. It exploits the relation
 
 .. math::
     H x = J^\top J x = J^\top(J x)
 
-The convergence of Conjugate Gradients depends on the conditioner
-number :math:`\kappa(H)`. Usually :math:`H` is poorly conditioned and
-a :ref:`section-preconditioner` must be used to get reasonable
-performance. Currently only the ``JACOBI`` preconditioner is available
-for use with ``CGNR``. It uses the block diagonal of :math:`H` to
-precondition the normal equations.
+Because ``CGNR`` never solves the linear system exactly, when the user
+chooses ``CGNR`` as the linear solver, Ceres automatically switches
+from the exact step algorithm to an inexact step algorithm. This also
+means that ``CGNR`` can only be used with ``LEVENBERG_MARQUARDT`` and
+not with ``DOGLEG`` trust region strategy.
 
-When the user chooses ``CGNR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+``CGNR`` by default runs on the CPU. However, if an NVIDIA GPU is
+available and Ceres Solver has been built with CUDA support enabled,
+then the user can also choose to run ``CGNR`` on the GPU by setting
+:member:`Solver::Options::sparse_linear_algebra_library_type` to
+``CUDA_SPARSE``. The key complexity of ``CGNR`` comes from evaluating
+the two sparse-matrix vector products (SpMV) :math:`Jx` and
+:math:`J'y`. GPUs are particularly well suited for doing sparse
+matrix-vector products. As a result, for large problems using a GPU
+can lead to a substantial speedup.
+
+The convergence of Conjugate Gradients depends on the conditioner
+number :math:`\kappa(H)`. Usually :math:`H` is quite poorly
+conditioned and a `Preconditioner
+<https://en.wikipedia.org/wiki/Preconditioner>`_ must be used to get
+reasonable performance. See section on :ref:`section-preconditioner`
+for more details.
 
 .. _section-schur:
 
-``DENSE_SCHUR`` & ``SPARSE_SCHUR``
-----------------------------------
+DENSE_SCHUR & SPARSE_SCHUR
+--------------------------
 
 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle
-adjustment problems, bundle adjustment problem have a special
-structure, and a more efficient scheme for solving :eq:`normal`
-can be constructed.
+adjustment problems, they have a special sparsity structure that can
+be exploited to solve the normal equations more efficiently.
 
-Suppose that the SfM problem consists of :math:`p` cameras and
-:math:`q` points and the variable vector :math:`x` has the block
-structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where,
-:math:`y` and :math:`z` correspond to camera and point parameters,
-respectively.  Further, let the camera blocks be of size :math:`c` and
-the point blocks be of size :math:`s` (for most problems :math:`c` =
-:math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy
-requirement on these block sizes, but choosing them to be constant
-simplifies the exposition.
+Suppose that the bundle adjustment problem consists of :math:`p`
+cameras and :math:`q` points and the variable vector :math:`x` has the
+block structure :math:`x = [y_{1}, ... ,y_{p},z_{1},
+... ,z_{q}]`. Where, :math:`y` and :math:`z` correspond to camera and
+point parameters respectively.  Further, let the camera blocks be of
+size :math:`c` and the point blocks be of size :math:`s` (for most
+problems :math:`c` = :math:`6`--`9` and :math:`s = 3`). Ceres does not
+impose any constancy requirement on these block sizes, but choosing
+them to be constant simplifies the exposition.
 
-A key characteristic of the bundle adjustment problem is that there is
-no term :math:`f_{i}` that includes two or more point blocks.  This in
-turn implies that the matrix :math:`H` is of the form
+The key property of bundle adjustment problems which we will exploit
+is the fact that no term :math:`f_{i}` in :eq:`nonlinsq` includes two
+or more point blocks at the same time.  This in turn implies that the
+matrix :math:`H` is of the form
 
 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ ,
    :label: hblock
@@ -585,123 +730,208 @@
 observe at least one common point.
 
 
-Now, :eq:`linear2` can be solved by first forming :math:`S`, solving for
-:math:`\Delta y`, and then back-substituting :math:`\Delta y` to
+Now :eq:`linear2` can be solved by first forming :math:`S`, solving
+for :math:`\Delta y`, and then back-substituting :math:`\Delta y` to
 obtain the value of :math:`\Delta z`.  Thus, the solution of what was
 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the
 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix
 and matrix-vector multiplies, and the solution of block sparse
 :math:`pc\times pc` linear system :eq:`schur`.  For almost all
 problems, the number of cameras is much smaller than the number of
-points, :math:`p \ll q`, thus solving :eq:`schur` is
-significantly cheaper than solving :eq:`linear2`. This is the
-*Schur complement trick* [Brown]_.
+points, :math:`p \ll q`, thus solving :eq:`schur` is significantly
+cheaper than solving :eq:`linear2`. This is the *Schur complement
+trick* [Brown]_.
 
-This still leaves open the question of solving :eq:`schur`. The
-method of choice for solving symmetric positive definite systems
-exactly is via the Cholesky factorization [TrefethenBau]_ and
-depending upon the structure of the matrix, there are, in general, two
-options. The first is direct factorization, where we store and factor
-:math:`S` as a dense matrix [TrefethenBau]_. This method has
+This still leaves open the question of solving :eq:`schur`. As we
+discussed when considering the exact solution of the normal equations
+using Cholesky factorization, we have two options.
+
+1. ``DENSE_SCHUR`` - The first is **dense Cholesky factorization**,
+where we store and factor :math:`S` as a dense matrix. This method has
 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and
-is only practical for problems with up to a few hundred cameras. Ceres
-implements this strategy as the ``DENSE_SCHUR`` solver.
+is only practical for problems with up to a few hundred cameras.
 
-
-But, :math:`S` is typically a fairly sparse matrix, as most images
-only see a small fraction of the scene. This leads us to the second
-option: Sparse Direct Methods. These methods store :math:`S` as a
+2. ``SPARSE_SCHUR`` - For large bundle adjustment problems :math:`S`
+is typically a fairly sparse matrix, as most images only see a small
+fraction of the scene. This leads us to the second option: **sparse
+Cholesky factorization** [Davis]_.  Here we store :math:`S` as a
 sparse matrix, use row and column re-ordering algorithms to maximize
 the sparsity of the Cholesky decomposition, and focus their compute
-effort on the non-zero part of the factorization [Chen]_. Sparse
-direct methods, depending on the exact sparsity structure of the Schur
-complement, allow bundle adjustment algorithms to significantly scale
-up over those based on dense factorization. Ceres implements this
-strategy as the ``SPARSE_SCHUR`` solver.
+effort on the non-zero part of the factorization [Davis]_ [Chen]_
+. Sparse direct methods, depending on the exact sparsity structure of
+the Schur complement, allow bundle adjustment algorithms to scenes
+with thousands of cameras.
+
 
 .. _section-iterative_schur:
 
-``ITERATIVE_SCHUR``
--------------------
+ITERATIVE_SCHUR
+---------------
 
-Another option for bundle adjustment problems is to apply
-Preconditioned Conjugate Gradients to the reduced camera matrix
-:math:`S` instead of :math:`H`. One reason to do this is that
-:math:`S` is a much smaller matrix than :math:`H`, but more
-importantly, it can be shown that :math:`\kappa(S)\leq \kappa(H)`.
+Another option for bundle adjustment problems is to apply Conjugate
+Gradients to the reduced camera matrix :math:`S` instead of
+:math:`H`. One reason to do this is that :math:`S` is a much smaller
+matrix than :math:`H`, but more importantly, it can be shown that
+:math:`\kappa(S)\leq \kappa(H)` [Agarwal]_.
+
 Ceres implements Conjugate Gradients on :math:`S` as the
 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR``
 as the linear solver, Ceres automatically switches from the exact step
 algorithm to an inexact step algorithm.
 
+
 The key computational operation when using Conjuagate Gradients is the
 evaluation of the matrix vector product :math:`Sx` for an arbitrary
-vector :math:`x`. There are two ways in which this product can be
-evaluated, and this can be controlled using
-``Solver::Options::use_explicit_schur_complement``. Depending on the
-problem at hand, the performance difference between these two methods
-can be quite substantial.
+vector :math:`x`. Because PCG only needs access to :math:`S` via its
+product with a vector, one way to evaluate :math:`Sx` is to observe
+that
 
-  1. **Implicit** This is default. Implicit evaluation is suitable for
-     large problems where the cost of computing and storing the Schur
-     Complement :math:`S` is prohibitive. Because PCG only needs
-     access to :math:`S` via its product with a vector, one way to
-     evaluate :math:`Sx` is to observe that
+.. math::  x_1 &= E^\top x\\
+           x_2 &= C^{-1} x_1\\
+           x_3 &= Ex_2\\
+           x_4 &= Bx\\
+           Sx &= x_4 - x_3
+   :label: schurtrick1
 
-     .. math::  x_1 &= E^\top x\\
-                x_2 &= C^{-1} x_1\\
-                x_3 &= Ex_2\\
-                x_4 &= Bx\\
-                Sx &= x_4 - x_3
-        :label: schurtrick1
+Thus, we can run Conjugate Gradients on :math:`S` with the same
+computational effort per iteration as Conjugate Gradients on
+:math:`H`, while reaping the benefits of a more powerful
+preconditioner. In fact, we do not even need to compute :math:`H`,
+:eq:`schurtrick1` can be implemented using just the columns of
+:math:`J`.
 
-     Thus, we can run PCG on :math:`S` with the same computational
-     effort per iteration as PCG on :math:`H`, while reaping the
-     benefits of a more powerful preconditioner. In fact, we do not
-     even need to compute :math:`H`, :eq:`schurtrick1` can be
-     implemented using just the columns of :math:`J`.
+Equation :eq:`schurtrick1` is closely related to *Domain Decomposition
+methods* for solving large linear systems that arise in structural
+engineering and partial differential equations. In the language of
+Domain Decomposition, each point in a bundle adjustment problem is a
+domain, and the cameras form the interface between these domains. The
+iterative solution of the Schur complement then falls within the
+sub-category of techniques known as Iterative Sub-structuring [Saad]_
+[Mathew]_.
 
-     Equation :eq:`schurtrick1` is closely related to *Domain
-     Decomposition methods* for solving large linear systems that
-     arise in structural engineering and partial differential
-     equations. In the language of Domain Decomposition, each point in
-     a bundle adjustment problem is a domain, and the cameras form the
-     interface between these domains. The iterative solution of the
-     Schur complement then falls within the sub-category of techniques
-     known as Iterative Sub-structuring [Saad]_ [Mathew]_.
+While in most cases the above method for evaluating :math:`Sx` is the
+way to go, for some problems it is better to compute the Schur
+complemenent :math:`S` explicitly and then run Conjugate Gradients on
+it. This can be done by settin
+``Solver::Options::use_explicit_schur_complement`` to ``true``. This
+option can only be used with the ``SCHUR_JACOBI`` preconditioner.
 
-  2. **Explicit** The complexity of implicit matrix-vector product
-     evaluation scales with the number of non-zeros in the
-     Jacobian. For small to medium sized problems, the cost of
-     constructing the Schur Complement is small enough that it is
-     better to construct it explicitly in memory and use it to
-     evaluate the product :math:`Sx`.
 
-When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres
-automatically switches from the exact step algorithm to an inexact
-step algorithm.
+.. _section-schur_power_series_expansion:
 
-  .. NOTE::
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
 
-     In exact arithmetic, the choice of implicit versus explicit Schur
-     complement would have no impact on solution quality. However, in
-     practice if the Jacobian is poorly conditioned, one may observe
-     (usually small) differences in solution quality. This is a
-     natural consequence of performing computations in finite arithmetic.
+It can be shown that the inverse of the Schur complement can be
+written as an infinite power-series [Weber]_ [Zheng]_:
 
+.. math:: S      &= B - EC^{-1}E^\top\\
+	         &= B(I - B^{-1}EC^{-1}E^\top)\\
+	  S^{-1} &= (I - B^{-1}EC^{-1}E^\top)^{-1} B^{-1}\\
+	         & = \sum_{i=0}^\infty \left(B^{-1}EC^{-1}E^\top\right)^{i} B^{-1}
+
+As a result a truncated version of this power series expansion can be
+used to approximate the inverse and therefore the solution to
+:eq:`schur`. Ceres allows the user to use Schur power series expansion
+in three ways.
+
+1. As a linear solver. This is what [Weber]_ calls **Power Bundle
+   Adjustment** and corresponds to using the truncated power series to
+   approximate the inverse of the Schur complement. This is done by
+   setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = IDENTITY
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      Solver::Options::spse_tolerance = 0.1;
+
+
+2. As a preconditioner for ``ITERATIVE_SCHUR``. Any method for
+   approximating the inverse of a matrix can also be used as a
+   preconditioner. This is enabled by setting the following options.
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = SCHUR_POWER_SERIES_EXPANSION;
+      Solver::Options::use_spse_initialization = false;
+
+      // This is worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+
+
+3. As initialization for ``ITERATIIVE_SCHUR`` with any
+   preconditioner. This is a combination of the above two, where the
+   Schur Power Series Expansion
+
+   .. code-block:: c++
+
+      Solver::Options::linear_solver_type = ITERATIVE_SCHUR
+      Solver::Options::preconditioner_type = ... // Preconditioner of your choice.
+      Solver::Options::use_spse_initialization = true
+      Solver::Options::max_linear_solver_iterations = 0;
+
+      // The following two settings are worth tuning for your application.
+      Solver::Options::max_num_spse_iterations = 5;
+      // This only affects the initialization but not the preconditioner.
+      Solver::Options::spse_tolerance = 0.1;
+
+
+.. _section-mixed-precision:
+
+Mixed Precision Solves
+======================
+
+Generally speaking Ceres Solver does all its arithmetic in double
+precision. Sometimes though, one can use single precision arithmetic
+to get substantial speedups. Currently, for linear solvers that
+perform Cholesky factorization (sparse or dense) the user has the
+option cast the linear system to single precision and then use
+single precision Cholesky factorization routines to solve the
+resulting linear system. This can be enabled by setting
+:member:`Solver::Options::use_mixed_precision_solves` to ``true``.
+
+Depending on the conditioning of the problem, the use of single
+precision factorization may lead to some loss of accuracy. Some of
+this accuracy can be recovered by performing `Iterative Refinement
+<https://en.wikipedia.org/wiki/Iterative_refinement>`_. The number of
+iterations of iterative refinement are controlled by
+:member:`Solver::Options::max_num_refinement_iterations`. The default
+value of this parameter is zero, which means if
+:member:`Solver::Options::use_mixed_precision_solves` is ``true``,
+then no iterative refinement is performed. Usually 2-3 refinement
+iterations are enough.
+
+Mixed precision solves are available in the following linear solver
+configurations:
+
+1. ``DENSE_NORMAL_CHOLESKY`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+2. ``DENSE_SCHUR`` + ``EIGEN``/ ``LAPACK`` / ``CUDA``.
+3. ``SPARSE_NORMAL_CHOLESKY`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+4. ``SPARSE_SCHUR`` + ``EIGEN_SPARSE`` / ``ACCELERATE_SPARSE``
+
+Mixed precision solves area not available when using ``SUITE_SPARSE``
+as the sparse linear algebra backend because SuiteSparse/CHOLMOD does
+not support single precision solves.
 
 .. _section-preconditioner:
 
-Preconditioner
-==============
+Preconditioners
+===============
 
-The convergence rate of Conjugate Gradients for
-solving :eq:`normal` depends on the distribution of eigenvalues
-of :math:`H` [Saad]_. A useful upper bound is
-:math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition
-number of the matrix :math:`H`. For most bundle adjustment problems,
-:math:`\kappa(H)` is high and a direct application of Conjugate
-Gradients to :eq:`normal` results in extremely poor performance.
+The convergence rate of Conjugate Gradients for solving :eq:`normal`
+depends on the distribution of eigenvalues of :math:`H` [Saad]_. A
+useful upper bound is :math:`\sqrt{\kappa(H)}`, where,
+:math:`\kappa(H)` is the condition number of the matrix :math:`H`. For
+most non-linear least squares problems, :math:`\kappa(H)` is high and
+a direct application of Conjugate Gradients to :eq:`normal` results in
+extremely poor performance.
 
 The solution to this problem is to replace :eq:`normal` with a
 *preconditioned* system.  Given a linear system, :math:`Ax =b` and a
@@ -733,8 +963,15 @@
 problems with general sparsity as well as the special sparsity
 structure encountered in bundle adjustment problems.
 
-``JACOBI``
-----------
+IDENTITY
+--------
+
+This is equivalent to using an identity matrix as a preconditioner,
+i.e. no preconditioner at all.
+
+
+JACOBI
+------
 
 The simplest of all preconditioners is the diagonal or Jacobi
 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for
@@ -742,24 +979,27 @@
 block Jacobi preconditioner. The ``JACOBI`` preconditioner in Ceres
 when used with :ref:`section-cgnr` refers to the block diagonal of
 :math:`H` and when used with :ref:`section-iterative_schur` refers to
-the block diagonal of :math:`B` [Mandel]_. For detailed performance
-data about the performance of ``JACOBI`` on bundle adjustment problems
-see [Agarwal]_.
+the block diagonal of :math:`B` [Mandel]_.
+
+For detailed performance data about the performance of ``JACOBI`` on
+bundle adjustment problems see [Agarwal]_.
 
 
-``SCHUR_JACOBI``
-----------------
+SCHUR_JACOBI
+------------
 
 Another obvious choice for :ref:`section-iterative_schur` is the block
 diagonal of the Schur complement matrix :math:`S`, i.e, the block
 Jacobi preconditioner for :math:`S`. In Ceres we refer to it as the
-``SCHUR_JACOBI`` preconditioner.  For detailed performance data about
-the performance of ``SCHUR_JACOBI`` on bundle adjustment problems see
-[Agarwal]_.
+``SCHUR_JACOBI`` preconditioner.
 
 
-``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``
-----------------------------------------------
+For detailed performance data about the performance of
+``SCHUR_JACOBI`` on bundle adjustment problems see [Agarwal]_.
+
+
+CLUSTER_JACOBI and CLUSTER_TRIDIAGONAL
+--------------------------------------
 
 For bundle adjustment problems arising in reconstruction from
 community photo collections, more effective preconditioners can be
@@ -790,8 +1030,19 @@
 algorithm. The choice of clustering algorithm is controlled by
 :member:`Solver::Options::visibility_clustering_type`.
 
-``SUBSET``
-----------
+SCHUR_POWER_SERIES_EXPANSION
+----------------------------
+
+As explained in :ref:`section-schur_power_series_expansion`, the Schur
+complement matrix admits a power series expansion and a truncated
+version of this power series can be used as a preconditioner for
+``ITERATIVE_SCHUR``. When used as a preconditioner
+:member:`Solver::Options::max_num_spse_iterations` controls the number
+of terms in the power series that are used.
+
+
+SUBSET
+------
 
 This is a  preconditioner for problems with general  sparsity. Given a
 subset  of residual  blocks of  a problem,  it uses  the corresponding
@@ -811,6 +1062,8 @@
 :math:`Q` approximates :math:`J^\top J`, or how well the chosen
 residual blocks approximate the full problem.
 
+This preconditioner is NOT available when running ``CGNR`` using
+``CUDA``.
 
 .. _section-ordering:
 
@@ -821,33 +1074,36 @@
 have a significant of impact on the efficiency and accuracy of the
 method. For example when doing sparse Cholesky factorization, there
 are matrices for which a good ordering will give a Cholesky factor
-with :math:`O(n)` storage, where as a bad ordering will result in an
+with :math:`O(n)` storage, whereas a bad ordering will result in an
 completely dense factor.
 
 Ceres allows the user to provide varying amounts of hints to the
 solver about the variable elimination ordering to use. This can range
-from no hints, where the solver is free to decide the best ordering
-based on the user's choices like the linear solver being used, to an
-exact order in which the variables should be eliminated, and a variety
-of possibilities in between.
+from no hints, where the solver is free to decide the best possible
+ordering based on the user's choices like the linear solver being
+used, to an exact order in which the variables should be eliminated,
+and a variety of possibilities in between.
 
-Instances of the :class:`ParameterBlockOrdering` class are used to
-communicate this information to Ceres.
+The simplest thing to do is to just set
+:member:`Solver::Options::linear_solver_ordering_type` to ``AMD``
+(default) or ``NESDIS`` based on your understanding of the problem or
+empirical testing.
 
-Formally an ordering is an ordered partitioning of the parameter
-blocks. Each parameter block belongs to exactly one group, and each
-group has a unique integer associated with it, that determines its
-order in the set of groups. We call these groups *Elimination Groups*
 
-Given such an ordering, Ceres ensures that the parameter blocks in the
-lowest numbered elimination group are eliminated first, and then the
-parameter blocks in the next lowest numbered elimination group and so
-on. Within each elimination group, Ceres is free to order the
-parameter blocks as it chooses. For example, consider the linear system
+More information can be commmuniucated by using an instance
+:class:`ParameterBlockOrdering` class.
+
+Formally an ordering is an ordered partitioning of the
+parameter blocks, i.e, each parameter block belongs to exactly
+one group, and each group has a unique non-negative integer
+associated with it, that determines its order in the set of
+groups.
+
+e.g. Consider the linear system
 
 .. math::
-  x + y &= 3\\
-  2x + 3y &= 7
+   x + y &= 3 \\
+   2x + 3y &= 7
 
 There are two ways in which it can be solved. First eliminating
 :math:`x` from the two equations, solving for :math:`y` and then back
@@ -855,33 +1111,92 @@
 for :math:`x` and back substituting for :math:`y`. The user can
 construct three orderings here.
 
-1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first.
-2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first.
-3. :math:`\{0: x, y\}`        : Solver gets to decide the elimination order.
+1. :math:`\{0: x\}, \{1: y\}` - eliminate :math:`x` first.
+2. :math:`\{0: y\}, \{1: x\}` - eliminate :math:`y` first.
+3. :math:`\{0: x, y\}` - Solver gets to decide the elimination order.
 
-Thus, to have Ceres determine the ordering automatically using
-heuristics, put all the variables in the same elimination group. The
-identity of the group does not matter. This is the same as not
-specifying an ordering at all. To control the ordering for every
-variable, create an elimination group per variable, ordering them in
-the desired order.
+Thus, to have Ceres determine the ordering automatically, put all the
+variables in group 0 and to control the ordering for every variable,
+create groups :math:`0 \dots N-1`, one per variable, in the desired
+order.
+
+``linear_solver_ordering == nullptr`` and an ordering where all the
+parameter blocks are in one elimination group mean the same thing -
+the solver is free to choose what it thinks is the best elimination
+ordering using the ordering algorithm (specified using
+:member:`Solver::Options::linear_solver_ordering_type`). Therefore in
+the following we will only consider the case where
+``linear_solver_ordering != nullptr``.
+
+The exact interpretation of the ``linear_solver_ordering`` depends on
+the values of :member:`Solver::Options::linear_solver_ordering_type`,
+:member:`Solver::Options::linear_solver_type`,
+:member:`Solver::Options::preconditioner_type` and
+:member:`Solver::Options::sparse_linear_algebra_library_type` as we will
+explain below.
+
+Bundle Adjustment
+-----------------
 
 If the user is using one of the Schur solvers (``DENSE_SCHUR``,
 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an
 ordering, it must have one important property. The lowest numbered
 elimination group must form an independent set in the graph
 corresponding to the Hessian, or in other words, no two parameter
-blocks in in the first elimination group should co-occur in the same
+blocks in the first elimination group should co-occur in the same
 residual block. For the best performance, this elimination group
 should be as large as possible. For standard bundle adjustment
 problems, this corresponds to the first elimination group containing
-all the 3d points, and the second containing the all the cameras
-parameter blocks.
+all the 3d points, and the second containing the parameter blocks for
+all the cameras.
 
 If the user leaves the choice to Ceres, then the solver uses an
 approximate maximum independent set algorithm to identify the first
 elimination group [LiSaad]_.
 
+``sparse_linear_algebra_library_type = SUITE_SPARSE``
+-----------------------------------------------------
+
+**linear_solver_ordering_type = AMD**
+
+A constrained Approximate Minimum Degree (CAMD) ordering is used where
+the parameter blocks in the lowest numbered group are eliminated
+first, and then the parameter blocks in the next lowest numbered group
+and so on. Within each group, CAMD is free to order the parameter blocks
+as it chooses.
+
+**linear_solver_ordering_type = NESDIS**
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and a Nested
+   Dissection algorithm is used to compute a fill reducing ordering.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   Nested Dissection is used to compute a fill reducing ordering for
+   the Schur Complement (or its preconditioner).
+
+``sparse_linear_algebra_library_type = EIGEN_SPARSE/ACCELERATE_SPARSE``
+-----------------------------------------------------------------------
+
+a. ``linear_solver_type = SPARSE_NORMAL_CHOLESKY`` or
+   ``linear_solver_type = CGNR`` and ``preconditioner_type = SUBSET``
+
+   The value of ``linear_solver_ordering`` is ignored and ``AMD`` or
+   ``NESDIS`` is used to compute a fill reducing ordering as requested
+   by the user.
+
+b. ``linear_solver_type = SPARSE_SCHUR/DENSE_SCHUR/ITERATIVE_SCHUR``
+
+   ONLY the lowest group are used to compute the Schur complement, and
+   ``AMD`` or ``NESID`` is used to compute a fill reducing ordering
+   for the Schur Complement (or its preconditioner) as requested by
+   the user.
+
+
 .. _section-solver-options:
 
 :class:`Solver::Options`
@@ -892,7 +1207,7 @@
    :class:`Solver::Options` controls the overall behavior of the
    solver. We list the various settings and their default values below.
 
-.. function:: bool Solver::Options::IsValid(string* error) const
+.. function:: bool Solver::Options::IsValid(std::string* error) const
 
    Validate the values in the options struct and returns true on
    success. If there is a problem, the method returns false with
@@ -913,14 +1228,18 @@
    Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``,
    ``BFGS`` and ``LBFGS``.
 
+   See :ref:`section-line-search-methods` for more details.
+
 .. member:: LineSearchType Solver::Options::line_search_type
 
    Default: ``WOLFE``
 
    Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions).
    Note that in order for the assumptions underlying the ``BFGS`` and
-   ``LBFGS`` line search direction algorithms to be guaranteed to be
-   satisifed, the ``WOLFE`` line search should be used.
+   ``LBFGS`` line search direction algorithms to be satisfied, the
+   ``WOLFE`` line search must be used.
+
+   See :ref:`section-line-search-methods` for more details.
 
 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type
 
@@ -931,26 +1250,28 @@
 
 .. member:: int Solver::Options::max_lbfgs_rank
 
-   Default: 20
+   Default: ``20``
 
-   The L-BFGS hessian approximation is a low rank approximation to the
-   inverse of the Hessian matrix. The rank of the approximation
-   determines (linearly) the space and time complexity of using the
-   approximation. Higher the rank, the better is the quality of the
-   approximation. The increase in quality is however is bounded for a
-   number of reasons.
+   The LBFGS hessian approximation is a low rank approximation to
+   the inverse of the Hessian matrix. The rank of the
+   approximation determines (linearly) the space and time
+   complexity of using the approximation. Higher the rank, the
+   better is the quality of the approximation. The increase in
+   quality is however is bounded for a number of reasons.
 
-     1. The method only uses secant information and not actual
-        derivatives.
+   1. The method only uses secant information and not actual
+   derivatives.
+   2. The Hessian approximation is constrained to be positive
+   definite.
 
-     2. The Hessian approximation is constrained to be positive
-        definite.
+   So increasing this rank to a large number will cost time and
+   space complexity without the corresponding increase in solution
+   quality. There are no hard and fast rules for choosing the
+   maximum rank. The best choice usually requires some problem
+   specific experimentation.
 
-   So increasing this rank to a large number will cost time and space
-   complexity without the corresponding increase in solution
-   quality. There are no hard and fast rules for choosing the maximum
-   rank. The best choice usually requires some problem specific
-   experimentation.
+   For more theoretical and implementation details of the LBFGS
+   method, please see [Nocedal]_.
 
 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling
 
@@ -999,6 +1320,8 @@
 
 .. member:: double Solver::Options::min_line_search_step_size
 
+   Default: ``1e-9``
+
    The line search terminates if:
 
    .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size}
@@ -1153,7 +1476,8 @@
 
 .. member:: double Solver::Options::max_solver_time_in_seconds
 
-   Default: ``1e6``
+   Default: ``1e9``
+
    Maximum amount of time for which the solver should run.
 
 .. member:: int Solver::Options::num_threads
@@ -1239,8 +1563,8 @@
 
    where :math:`\|\cdot\|_\infty` refers to the max norm, :math:`\Pi`
    is projection onto the bounds constraints and :math:`\boxplus` is
-   Plus operation for the overall local parameterization associated
-   with the parameter vector.
+   Plus operation for the overall manifold associated with the
+   parameter vector.
 
 .. member:: double Solver::Options::parameter_tolerance
 
@@ -1260,7 +1584,7 @@
    Type of linear solver used to compute the solution to the linear
    least squares problem in each iteration of the Levenberg-Marquardt
    algorithm. If Ceres is built with support for ``SuiteSparse`` or
-   ``CXSparse`` or ``Eigen``'s sparse Cholesky factorization, the
+   ``Accelerate`` or ``Eigen``'s sparse Cholesky factorization, the
    default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR``
    otherwise.
 
@@ -1271,8 +1595,9 @@
    The preconditioner used by the iterative linear solver. The default
    is the block Jacobi preconditioner. Valid values are (in increasing
    order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``,
-   ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See
-   :ref:`section-preconditioner` for more details.
+   ``CLUSTER_JACOBI``, ``CLUSTER_TRIDIAGONAL``, ``SUBSET`` and
+   ``SCHUR_POWER_SERIES_EXPANSION``. See :ref:`section-preconditioner`
+   for more details.
 
 .. member:: VisibilityClusteringType Solver::Options::visibility_clustering_type
 
@@ -1296,7 +1621,7 @@
    recommend that you try ``CANONICAL_VIEWS`` first and if it is too
    expensive try ``SINGLE_LINKAGE``.
 
-.. member:: std::unordered_set<ResidualBlockId> residual_blocks_for_subset_preconditioner
+.. member:: std::unordered_set<ResidualBlockId> Solver::Options::residual_blocks_for_subset_preconditioner
 
    ``SUBSET`` preconditioner is a preconditioner for problems with
    general sparsity. Given a subset of residual blocks of a problem,
@@ -1321,65 +1646,76 @@
 
 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type
 
-   Default:``EIGEN``
+   Default: ``EIGEN``
 
    Ceres supports using multiple dense linear algebra libraries for
-   dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are
-   the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers
-   to the system ``BLAS + LAPACK`` library which may or may not be
-   available.
+   dense matrix factorizations. Currently ``EIGEN``, ``LAPACK`` and
+   ``CUDA`` are the valid choices. ``EIGEN`` is always available,
+   ``LAPACK`` refers to the system ``BLAS + LAPACK`` library which may
+   or may not be available. ``CUDA`` refers to Nvidia's GPU based
+   dense linear algebra library which may or may not be available.
 
    This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY``
-   and ``DENSE_SCHUR`` solvers. For small to moderate sized probem
+   and ``DENSE_SCHUR`` solvers. For small to moderate sized problem
    ``EIGEN`` is a fine choice but for large problems, an optimized
-   ``LAPACK + BLAS`` implementation can make a substantial difference
-   in performance.
+   ``LAPACK + BLAS`` or ``CUDA`` implementation can make a substantial
+   difference in performance.
 
 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type
 
    Default: The highest available according to: ``SUITE_SPARSE`` >
-   ``CX_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
+   ``ACCELERATE_SPARSE`` > ``EIGEN_SPARSE`` > ``NO_SPARSE``
 
    Ceres supports the use of three sparse linear algebra libraries,
    ``SuiteSparse``, which is enabled by setting this parameter to
-   ``SUITE_SPARSE``, ``CXSparse``, which can be selected by setting
-   this parameter to ``CX_SPARSE`` and ``Eigen`` which is enabled by
-   setting this parameter to ``EIGEN_SPARSE``.  Lastly, ``NO_SPARSE``
-   means that no sparse linear solver should be used; note that this is
-   irrespective of whether Ceres was compiled with support for one.
+   ``SUITE_SPARSE``, ``Acclerate``, which can be selected by setting
+   this parameter to ``ACCELERATE_SPARSE`` and ``Eigen`` which is
+   enabled by setting this parameter to ``EIGEN_SPARSE``.  Lastly,
+   ``NO_SPARSE`` means that no sparse linear solver should be used;
+   note that this is irrespective of whether Ceres was compiled with
+   support for one.
 
-   ``SuiteSparse`` is a sophisticated and complex sparse linear
-   algebra library and should be used in general.
+   ``SuiteSparse`` is a sophisticated sparse linear algebra library
+   and should be used in general. On MacOS you may want to use the
+   ``Accelerate`` framework.
 
    If your needs/platforms prevent you from using ``SuiteSparse``,
-   consider using ``CXSparse``, which is a much smaller, easier to
-   build library. As can be expected, its performance on large
-   problems is not comparable to that of ``SuiteSparse``.
+   consider using the sparse linear algebra routines in ``Eigen``. The
+   sparse Cholesky algorithms currently included with ``Eigen`` are
+   not as sophisticated as the ones in ``SuiteSparse`` and
+   ``Accelerate`` and as a result its performance is considerably
+   worse.
 
-   Last but not the least you can use the sparse linear algebra
-   routines in ``Eigen``. Currently the performance of this library is
-   the poorest of the three. But this should change in the near
-   future.
+.. member:: LinearSolverOrderingType Solver::Options::linear_solver_ordering_type
 
-   Another thing to consider here is that the sparse Cholesky
-   factorization libraries in Eigen are licensed under ``LGPL`` and
-   building Ceres with support for ``EIGEN_SPARSE`` will result in an
-   LGPL licensed library (since the corresponding code from Eigen is
-   compiled into the library).
+   Default: ``AMD``
 
-   The upside is that you do not need to build and link to an external
-   library to use ``EIGEN_SPARSE``.
+    The order in which variables are eliminated in a linear solver can
+    have a significant impact on the efficiency and accuracy of the
+    method. e.g., when doing sparse Cholesky factorization, there are
+    matrices for which a good ordering will give a Cholesky factor
+    with :math:`O(n)` storage, where as a bad ordering will result in
+    an completely dense factor.
 
+    Sparse direct solvers like ``SPARSE_NORMAL_CHOLESKY`` and
+    ``SPARSE_SCHUR`` use a fill reducing ordering of the columns and
+    rows of the matrix being factorized before computing the numeric
+    factorization.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+    This enum controls the type of algorithm used to compute this fill
+    reducing ordering. There is no single algorithm that works on all
+    matrices, so determining which algorithm works better is a matter
+    of empirical experimentation.
 
-   Default: ``NULL``
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::linear_solver_ordering
+
+   Default: ``nullptr``
 
    An instance of the ordering object informs the solver about the
    desired order in which parameter blocks should be eliminated by the
-   linear solvers. See section~\ref{sec:ordering`` for more details.
+   linear solvers.
 
-   If ``NULL``, the solver is free to choose an ordering that it
+   If ``nullptr``, the solver is free to choose an ordering that it
    thinks is best.
 
    See :ref:`section-ordering` for more details.
@@ -1404,41 +1740,15 @@
    efficient to explicitly compute it and use it for evaluating the
    matrix-vector products.
 
-   Enabling this option tells ``ITERATIVE_SCHUR`` to use an explicitly
-   computed Schur complement. This can improve the performance of the
-   ``ITERATIVE_SCHUR`` solver significantly.
-
-   .. NOTE:
+   .. NOTE::
 
      This option can only be used with the ``SCHUR_JACOBI``
      preconditioner.
 
-.. member:: bool Solver::Options::use_post_ordering
+.. member:: bool Solver::Options::dynamic_sparsity
 
    Default: ``false``
 
-   Sparse Cholesky factorization algorithms use a fill-reducing
-   ordering to permute the columns of the Jacobian matrix. There are
-   two ways of doing this.
-
-   1. Compute the Jacobian matrix in some order and then have the
-      factorization algorithm permute the columns of the Jacobian.
-
-   2. Compute the Jacobian with its columns already permuted.
-
-   The first option incurs a significant memory penalty. The
-   factorization algorithm has to make a copy of the permuted Jacobian
-   matrix, thus Ceres pre-permutes the columns of the Jacobian matrix
-   and generally speaking, there is no performance penalty for doing
-   so.
-
-   In some rare cases, it is worth using a more complicated reordering
-   algorithm which has slightly better runtime performance at the
-   expense of an extra copy of the Jacobian matrix. Setting
-   ``use_postordering`` to ``true`` enables this tradeoff.
-
-.. member:: bool Solver::Options::dynamic_sparsity
-
    Some non-linear least squares problems are symbolically dense but
    numerically sparse. i.e. at any given state only a small number of
    Jacobian entries are non-zero, but the position and number of
@@ -1453,6 +1763,29 @@
 
    This setting only affects the `SPARSE_NORMAL_CHOLESKY` solver.
 
+.. member:: bool Solver::Options::use_mixed_precision_solves
+
+   Default: ``false``
+
+   If true, the Gauss-Newton matrix is computed in *double* precision, but
+   its factorization is computed in **single** precision. This can result in
+   significant time and memory savings at the cost of some accuracy in the
+   Gauss-Newton step. Iterative refinement is used to recover some
+   of this accuracy back.
+
+   If ``use_mixed_precision_solves`` is true, we recommend setting
+   ``max_num_refinement_iterations`` to 2-3.
+
+   See :ref:`section-mixed-precision` for more details.
+
+.. member:: int Solver::Options::max_num_refinement_iterations
+
+   Default: ``0``
+
+   Number steps of the iterative refinement process to run when
+   computing the Gauss-Newton step, see
+   :member:`Solver::Options::use_mixed_precision_solves`.
+
 .. member:: int Solver::Options::min_linear_solver_iterations
 
    Default: ``0``
@@ -1469,6 +1802,34 @@
    makes sense when the linear solver is an iterative solver, e.g.,
    ``ITERATIVE_SCHUR`` or ``CGNR``.
 
+.. member:: int Solver::Options::max_num_spse_iterations
+
+   Default: `5`
+
+   Maximum number of iterations performed by
+   ``SCHUR_POWER_SERIES_EXPANSION``. Each iteration corresponds to one
+   more term in the power series expansion od the inverse of the Schur
+   complement.  This value controls the maximum number of iterations
+   whether it is used as a preconditioner or just to initialize the
+   solution for ``ITERATIVE_SCHUR``.
+
+.. member:: bool Solver:Options::use_spse_initialization
+
+   Default: ``false``
+
+   Use Schur power series expansion to initialize the solution for
+   ``ITERATIVE_SCHUR``. This option can be set ``true`` regardless of
+   what preconditioner is being used.
+
+.. member:: double Solver::Options::spse_tolerance
+
+   Default: `0.1`
+
+   When ``use_spse_initialization`` is ``true``, this parameter along
+   with ``max_num_spse_iterations`` controls the number of
+   ``SCHUR_POWER_SERIES_EXPANSION`` iterations performed for
+   initialization. It is not used to control the preconditioner.
+
 .. member:: double Solver::Options::eta
 
    Default: ``1e-1``
@@ -1502,6 +1863,25 @@
    objects that have an :class:`EvaluationCallback` associated with
    them.
 
+.. member:: std::shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
+
+   Default: ``nullptr``
+
+   If :member:`Solver::Options::use_inner_iterations` true, then the
+   user has two choices.
+
+   1. Let the solver heuristically decide which parameter blocks to
+      optimize in each inner iteration. To do this, set
+      :member:`Solver::Options::inner_iteration_ordering` to ``nullptr``.
+
+   2. Specify a collection of of ordered independent sets. The lower
+      numbered groups are optimized before the higher number groups
+      during the inner optimization phase. Each group must be an
+      independent set. Not all parameter blocks need to be included in
+      the ordering.
+
+   See :ref:`section-ordering` for more details.
+
 .. member:: double Solver::Options::inner_iteration_tolerance
 
    Default: ``1e-3``
@@ -1516,37 +1896,21 @@
    inner iterations in subsequent trust region minimizer iterations is
    disabled.
 
-.. member:: shared_ptr<ParameterBlockOrdering> Solver::Options::inner_iteration_ordering
-
-   Default: ``NULL``
-
-   If :member:`Solver::Options::use_inner_iterations` true, then the
-   user has two choices.
-
-   1. Let the solver heuristically decide which parameter blocks to
-      optimize in each inner iteration. To do this, set
-      :member:`Solver::Options::inner_iteration_ordering` to ``NULL``.
-
-   2. Specify a collection of of ordered independent sets. The lower
-      numbered groups are optimized before the higher number groups
-      during the inner optimization phase. Each group must be an
-      independent set. Not all parameter blocks need to be included in
-      the ordering.
-
-   See :ref:`section-ordering` for more details.
 
 .. member:: LoggingType Solver::Options::logging_type
 
    Default: ``PER_MINIMIZER_ITERATION``
 
+   Valid values are ``SILENT`` and ``PER_MINIMIZER_ITERATION``.
+
 .. member:: bool Solver::Options::minimizer_progress_to_stdout
 
    Default: ``false``
 
-   By default the :class:`Minimizer` progress is logged to ``STDERR``
+   By default the Minimizer's progress is logged to ``STDERR``
    depending on the ``vlog`` level. If this flag is set to true, and
-   :member:`Solver::Options::logging_type` is not ``SILENT``, the logging
-   output is sent to ``STDOUT``.
+   :member:`Solver::Options::logging_type` is not ``SILENT``, the
+   logging output is sent to ``STDOUT``.
 
    For ``TRUST_REGION_MINIMIZER`` the progress display looks like
 
@@ -1595,7 +1959,7 @@
    #. ``it`` is the time take by the current iteration.
    #. ``tt`` is the total time taken by the minimizer.
 
-.. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
+.. member:: std::vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump
 
    Default: ``empty``
 
@@ -1603,7 +1967,7 @@
    the trust region problem. Useful for testing and benchmarking. If
    ``empty``, no problems are dumped.
 
-.. member:: string Solver::Options::trust_region_problem_dump_directory
+.. member:: std::string Solver::Options::trust_region_problem_dump_directory
 
    Default: ``/tmp``
 
@@ -1614,7 +1978,7 @@
     :member:`Solver::Options::trust_region_problem_dump_format_type` is not
     ``CONSOLE``.
 
-.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format
+.. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format_type
 
    Default: ``TEXTFILE``
 
@@ -1708,23 +2072,32 @@
    should not expect to look at the parameter blocks and interpret
    their values.
 
-.. member:: vector<IterationCallback> Solver::Options::callbacks
+.. member:: std::vector<IterationCallback*> Solver::Options::callbacks
+
+   Default: ``empty``
 
    Callbacks that are executed at the end of each iteration of the
-   :class:`Minimizer`. They are executed in the order that they are
+   minimizer. They are executed in the order that they are
    specified in this vector.
 
    By default, parameter blocks are updated only at the end of the
-   optimization, i.e., when the :class:`Minimizer` terminates. This
-   means that by default, if an :class:`IterationCallback` inspects
-   the parameter blocks, they will not see them changing in the course
-   of the optimization.
+   optimization, i.e., when the minimizer terminates. This means that
+   by default, if an :class:`IterationCallback` inspects the parameter
+   blocks, they will not see them changing in the course of the
+   optimization.
 
    To tell Ceres to update the parameter blocks at the end of each
    iteration and before calling the user's callback, set
    :member:`Solver::Options::update_state_every_iteration` to
    ``true``.
 
+   See `examples/iteration_callback_example.cc
+   <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+   for an example of an :class:`IterationCallback` that uses
+   :member:`Solver::Options::update_state_every_iteration` to log
+   changes to the parameter blocks over the course of the
+   optimization.
+
    The solver does NOT take ownership of these pointers.
 
 :class:`ParameterBlockOrdering`
@@ -1786,8 +2159,8 @@
 
    Number of groups with one or more elements.
 
-:class:`IterationCallback`
-==========================
+:class:`IterationSummary`
+=========================
 
 .. class:: IterationSummary
 
@@ -1863,7 +2236,7 @@
 
    Size of the trust region at the end of the current iteration. For
    the Levenberg-Marquardt algorithm, the regularization parameter is
-   1.0 / member::`IterationSummary::trust_region_radius`.
+   1.0 / :member:`IterationSummary::trust_region_radius`.
 
 .. member:: double IterationSummary::eta
 
@@ -1906,6 +2279,8 @@
 
    Time (in seconds) since the user called Solve().
 
+:class:`IterationCallback`
+==========================
 
 .. class:: IterationCallback
 
@@ -1939,6 +2314,10 @@
   #. ``SOLVER_CONTINUE`` indicates that the solver should continue
      optimizing.
 
+  The return values can be used to implement custom termination
+  criterion that supercede the iteration/time/tolerance based
+  termination implemented by Ceres.
+
   For example, the following :class:`IterationCallback` is used
   internally by Ceres to log the progress of the optimization.
 
@@ -1978,6 +2357,12 @@
     };
 
 
+  See `examples/evaluation_callback_example.cc
+  <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/iteration_callback_example.cc>`_
+  for another example that uses
+  :member:`Solver::Options::update_state_every_iteration` to log
+  changes to the parameter blocks over the course of the optimization.
+
 
 :class:`CRSMatrix`
 ==================
@@ -1995,13 +2380,13 @@
 
    Number of columns.
 
-.. member:: vector<int> CRSMatrix::rows
+.. member:: std::vector<int> CRSMatrix::rows
 
    :member:`CRSMatrix::rows` is a :member:`CRSMatrix::num_rows` + 1
    sized array that points into the :member:`CRSMatrix::cols` and
    :member:`CRSMatrix::values` array.
 
-.. member:: vector<int> CRSMatrix::cols
+.. member:: std::vector<int> CRSMatrix::cols
 
    :member:`CRSMatrix::cols` contain as many entries as there are
    non-zeros in the matrix.
@@ -2009,7 +2394,7 @@
    For each row ``i``, ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]``
    are the indices of the non-zero columns of row ``i``.
 
-.. member:: vector<int> CRSMatrix::values
+.. member:: std::vector<double> CRSMatrix::values
 
    :member:`CRSMatrix::values` contain as many entries as there are
    non-zeros in the matrix.
@@ -2043,12 +2428,12 @@
 
    Summary of the various stages of the solver after termination.
 
-.. function:: string Solver::Summary::BriefReport() const
+.. function:: std::string Solver::Summary::BriefReport() const
 
    A brief one line description of the state of the solver after
    termination.
 
-.. function:: string Solver::Summary::FullReport() const
+.. function:: std::string Solver::Summary::FullReport() const
 
    A full multiline description of the state of the solver after
    termination.
@@ -2071,7 +2456,7 @@
 
    The cause of the minimizer terminating.
 
-.. member:: string Solver::Summary::message
+.. member:: std::string Solver::Summary::message
 
    Reason why the solver terminated.
 
@@ -2091,14 +2476,14 @@
    were held fixed by the preprocessor because all the parameter
    blocks that they depend on were fixed.
 
-.. member:: vector<IterationSummary> Solver::Summary::iterations
+.. member:: std::vector<IterationSummary> Solver::Summary::iterations
 
    :class:`IterationSummary` for each minimizer iteration in order.
 
 .. member:: int Solver::Summary::num_successful_steps
 
    Number of minimizer iterations in which the step was
-   accepted. Unless :member:`Solver::Options::use_non_monotonic_steps`
+   accepted. Unless :member:`Solver::Options::use_nonmonotonic_steps`
    is `true` this is also the number of steps in which the objective
    function value/cost went down.
 
@@ -2112,13 +2497,13 @@
 
    Number of times inner iterations were performed.
 
- .. member:: int Solver::Summary::num_line_search_steps
+.. member:: int Solver::Summary::num_line_search_steps
 
-    Total number of iterations inside the line search algorithm across
-    all invocations. We call these iterations "steps" to distinguish
-    them from the outer iterations of the line search and trust region
-    minimizer algorithms which call the line search algorithm as a
-    subroutine.
+   Total number of iterations inside the line search algorithm across
+   all invocations. We call these iterations "steps" to distinguish
+   them from the outer iterations of the line search and trust region
+   minimizer algorithms which call the line search algorithm as a
+   subroutine.
 
 .. member:: double Solver::Summary::preprocessor_time_in_seconds
 
@@ -2126,7 +2511,7 @@
 
 .. member:: double Solver::Summary::minimizer_time_in_seconds
 
-   Time (in seconds) spent in the Minimizer.
+   Time (in seconds) spent in the minimizer.
 
 .. member:: double Solver::Summary::postprocessor_time_in_seconds
 
@@ -2180,7 +2565,7 @@
    Dimension of the tangent space of the problem (or the number of
    columns in the Jacobian for the problem). This is different from
    :member:`Solver::Summary::num_parameters` if a parameter block is
-   associated with a :class:`LocalParameterization`.
+   associated with a :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks
 
@@ -2206,7 +2591,7 @@
    number of columns in the Jacobian for the reduced problem). This is
    different from :member:`Solver::Summary::num_parameters_reduced` if
    a parameter block in the reduced problem is associated with a
-   :class:`LocalParameterization`.
+   :class:`Manifold`.
 
 .. member:: int Solver::Summary::num_residual_blocks_reduced
 
@@ -2224,9 +2609,7 @@
 .. member:: int Solver::Summary::num_threads_used
 
    Number of threads actually used by the solver for Jacobian and
-   residual evaluation. This number is not equal to
-   :member:`Solver::Summary::num_threads_given` if none of `OpenMP`
-   or `CXX_THREADS` is available.
+   residual evaluation.
 
 .. member:: LinearSolverType Solver::Summary::linear_solver_type_given
 
@@ -2242,12 +2625,12 @@
    `SPARSE_NORMAL_CHOLESKY` but no sparse linear algebra library was
    available.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_given
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_given
 
    Size of the elimination groups given by the user as hints to the
    linear solver.
 
-.. member:: vector<int> Solver::Summary::linear_solver_ordering_used
+.. member:: std::vector<int> Solver::Summary::linear_solver_ordering_used
 
    Size of the parameter groups used by the solver when ordering the
    columns of the Jacobian.  This maybe different from
@@ -2285,12 +2668,12 @@
    actually performed. For example, in a problem with just one parameter
    block, inner iterations are not performed.
 
-.. member:: vector<int> inner_iteration_ordering_given
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_given
 
    Size of the parameter groups given by the user for performing inner
    iterations.
 
-.. member:: vector<int> inner_iteration_ordering_used
+.. member:: std::vector<int> Solver::Summary::inner_iteration_ordering_used
 
    Size of the parameter groups given used by the solver for
    performing inner iterations. This maybe different from
@@ -2315,7 +2698,7 @@
 
    Type of clustering algorithm used for visibility based
    preconditioning. Only meaningful when the
-   :member:`Solver::Summary::preconditioner_type` is
+   :member:`Solver::Summary::preconditioner_type_used` is
    ``CLUSTER_JACOBI`` or ``CLUSTER_TRIDIAGONAL``.
 
 .. member:: TrustRegionStrategyType Solver::Summary::trust_region_strategy_type
diff --git a/docs/source/nnls_tutorial.rst b/docs/source/nnls_tutorial.rst
index 6c89032..6de800e 100644
--- a/docs/source/nnls_tutorial.rst
+++ b/docs/source/nnls_tutorial.rst
@@ -2,6 +2,8 @@
 
 .. default-domain:: cpp
 
+.. cpp:namespace:: ceres
+
 .. _chapter-nnls_tutorial:
 
 ========================
@@ -110,7 +112,7 @@
      // Set up the only cost function (also known as residual). This uses
      // auto-differentiation to obtain the derivative (jacobian).
      CostFunction* cost_function =
-         new AutoDiffCostFunction<CostFunctor, 1, 1>(new CostFunctor);
+         new AutoDiffCostFunction<CostFunctor, 1, 1>();
      problem.AddResidualBlock(cost_function, nullptr, &x);
 
      // Run the solver!
@@ -210,8 +212,7 @@
 .. code-block:: c++
 
   CostFunction* cost_function =
-    new NumericDiffCostFunction<NumericDiffCostFunctor, ceres::CENTRAL, 1, 1>(
-        new NumericDiffCostFunctor);
+    new NumericDiffCostFunction<NumericDiffCostFunctor, ceres::CENTRAL, 1, 1>();
   problem.AddResidualBlock(cost_function, nullptr, &x);
 
 Notice the parallel from when we were using automatic differentiation
@@ -219,7 +220,7 @@
 .. code-block:: c++
 
   CostFunction* cost_function =
-      new AutoDiffCostFunction<CostFunctor, 1, 1>(new CostFunctor);
+      new AutoDiffCostFunction<CostFunctor, 1, 1>();
   problem.AddResidualBlock(cost_function, nullptr, &x);
 
 The construction looks almost identical to the one used for automatic
@@ -355,16 +356,16 @@
 
   Problem problem;
 
-  // Add residual terms to the problem using the using the autodiff
+  // Add residual terms to the problem using the autodiff
   // wrapper to get the derivatives automatically.
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F1, 1, 1, 1>(new F1), nullptr, &x1, &x2);
+    new AutoDiffCostFunction<F1, 1, 1, 1>(), nullptr, &x1, &x2);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F2, 1, 1, 1>(new F2), nullptr, &x3, &x4);
+    new AutoDiffCostFunction<F2, 1, 1, 1>(), nullptr, &x3, &x4);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F3, 1, 1, 1>(new F3), nullptr, &x2, &x3)
+    new AutoDiffCostFunction<F3, 1, 1, 1>(), nullptr, &x2, &x3);
   problem.AddResidualBlock(
-    new AutoDiffCostFunction<F4, 1, 1, 1>(new F4), nullptr, &x1, &x4);
+    new AutoDiffCostFunction<F4, 1, 1, 1>(), nullptr, &x1, &x4);
 
 
 Note that each ``ResidualBlock`` only depends on the two parameters
@@ -377,62 +378,65 @@
 
     Initial x1 = 3, x2 = -1, x3 = 0, x4 = 1
     iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
-       0  1.075000e+02    0.00e+00    1.55e+02   0.00e+00   0.00e+00  1.00e+04       0    4.95e-04    2.30e-03
-       1  5.036190e+00    1.02e+02    2.00e+01   2.16e+00   9.53e-01  3.00e+04       1    4.39e-05    2.40e-03
-       2  3.148168e-01    4.72e+00    2.50e+00   6.23e-01   9.37e-01  9.00e+04       1    9.06e-06    2.43e-03
-       3  1.967760e-02    2.95e-01    3.13e-01   3.08e-01   9.37e-01  2.70e+05       1    8.11e-06    2.45e-03
-       4  1.229900e-03    1.84e-02    3.91e-02   1.54e-01   9.37e-01  8.10e+05       1    6.91e-06    2.48e-03
-       5  7.687123e-05    1.15e-03    4.89e-03   7.69e-02   9.37e-01  2.43e+06       1    7.87e-06    2.50e-03
-       6  4.804625e-06    7.21e-05    6.11e-04   3.85e-02   9.37e-01  7.29e+06       1    5.96e-06    2.52e-03
-       7  3.003028e-07    4.50e-06    7.64e-05   1.92e-02   9.37e-01  2.19e+07       1    5.96e-06    2.55e-03
-       8  1.877006e-08    2.82e-07    9.54e-06   9.62e-03   9.37e-01  6.56e+07       1    5.96e-06    2.57e-03
-       9  1.173223e-09    1.76e-08    1.19e-06   4.81e-03   9.37e-01  1.97e+08       1    7.87e-06    2.60e-03
-      10  7.333425e-11    1.10e-09    1.49e-07   2.40e-03   9.37e-01  5.90e+08       1    6.20e-06    2.63e-03
-      11  4.584044e-12    6.88e-11    1.86e-08   1.20e-03   9.37e-01  1.77e+09       1    6.91e-06    2.65e-03
-      12  2.865573e-13    4.30e-12    2.33e-09   6.02e-04   9.37e-01  5.31e+09       1    5.96e-06    2.67e-03
-      13  1.791438e-14    2.69e-13    2.91e-10   3.01e-04   9.37e-01  1.59e+10       1    7.15e-06    2.69e-03
+       0  1.075000e+02    0.00e+00    1.55e+02   0.00e+00   0.00e+00  1.00e+04        0    2.91e-05    3.40e-04
+       1  5.036190e+00    1.02e+02    2.00e+01   0.00e+00   9.53e-01  3.00e+04        1    4.98e-05    3.99e-04
+       2  3.148168e-01    4.72e+00    2.50e+00   6.23e-01   9.37e-01  9.00e+04        1    2.15e-06    4.06e-04
+       3  1.967760e-02    2.95e-01    3.13e-01   3.08e-01   9.37e-01  2.70e+05        1    9.54e-07    4.10e-04
+       4  1.229900e-03    1.84e-02    3.91e-02   1.54e-01   9.37e-01  8.10e+05        1    1.91e-06    4.14e-04
+       5  7.687123e-05    1.15e-03    4.89e-03   7.69e-02   9.37e-01  2.43e+06        1    1.91e-06    4.18e-04
+       6  4.804625e-06    7.21e-05    6.11e-04   3.85e-02   9.37e-01  7.29e+06        1    1.19e-06    4.21e-04
+       7  3.003028e-07    4.50e-06    7.64e-05   1.92e-02   9.37e-01  2.19e+07        1    1.91e-06    4.25e-04
+       8  1.877006e-08    2.82e-07    9.54e-06   9.62e-03   9.37e-01  6.56e+07        1    9.54e-07    4.28e-04
+       9  1.173223e-09    1.76e-08    1.19e-06   4.81e-03   9.37e-01  1.97e+08        1    9.54e-07    4.32e-04
+      10  7.333425e-11    1.10e-09    1.49e-07   2.40e-03   9.37e-01  5.90e+08        1    9.54e-07    4.35e-04
+      11  4.584044e-12    6.88e-11    1.86e-08   1.20e-03   9.37e-01  1.77e+09        1    9.54e-07    4.38e-04
+      12  2.865573e-13    4.30e-12    2.33e-09   6.02e-04   9.37e-01  5.31e+09        1    2.15e-06    4.42e-04
+      13  1.791438e-14    2.69e-13    2.91e-10   3.01e-04   9.37e-01  1.59e+10        1    1.91e-06    4.45e-04
+      14  1.120029e-15    1.68e-14    3.64e-11   1.51e-04   9.37e-01  4.78e+10        1    2.15e-06    4.48e-04
 
-    Ceres Solver v1.12.0 Solve Report
-    ----------------------------------
+    Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.1.0)-metis-(5.1.0)-acceleratesparse-eigensparse)
+
                                          Original                  Reduced
     Parameter blocks                            4                        4
     Parameters                                  4                        4
     Residual blocks                             4                        4
-    Residual                                    4                        4
+    Residuals                                   4                        4
 
     Minimizer                        TRUST_REGION
 
     Dense linear algebra library            EIGEN
     Trust region strategy     LEVENBERG_MARQUARDT
-
                                             Given                     Used
     Linear solver                        DENSE_QR                 DENSE_QR
     Threads                                     1                        1
-    Linear solver threads                       1                        1
+    Linear solver ordering              AUTOMATIC                        4
 
     Cost:
     Initial                          1.075000e+02
-    Final                            1.791438e-14
+    Final                            1.120029e-15
     Change                           1.075000e+02
 
-    Minimizer iterations                       14
-    Successful steps                           14
+    Minimizer iterations                       15
+    Successful steps                           15
     Unsuccessful steps                          0
 
     Time (in seconds):
-    Preprocessor                            0.002
+    Preprocessor                         0.000311
 
-      Residual evaluation                   0.000
-      Jacobian evaluation                   0.000
-      Linear solver                         0.000
-    Minimizer                               0.001
+      Residual only evaluation           0.000002 (14)
+      Jacobian & residual evaluation     0.000023 (15)
+      Linear solver                      0.000043 (14)
+    Minimizer                            0.000163
 
-    Postprocessor                           0.000
-    Total                                   0.005
+    Postprocessor                        0.000012
+    Total                                0.000486
 
     Termination:                      CONVERGENCE (Gradient tolerance reached. Gradient max norm: 3.642190e-11 <= 1.000000e-10)
 
-    Final x1 = 0.000292189, x2 = -2.92189e-05, x3 = 4.79511e-05, x4 = 4.79511e-05
+    Final x1 = 0.000146222, x2 = -1.46222e-05, x3 = 2.40957e-05, x4 = 2.40957e-05
+
+
+
 
 It is easy to see that the optimal solution to this problem is at
 :math:`x_1=0, x_2=0, x_3=0, x_4=0` with an objective function value of
@@ -494,8 +498,8 @@
  Problem problem;
  for (int i = 0; i < kNumObservations; ++i) {
    CostFunction* cost_function =
-        new AutoDiffCostFunction<ExponentialResidual, 1, 1, 1>(
-            new ExponentialResidual(data[2 * i], data[2 * i + 1]));
+        new AutoDiffCostFunction<ExponentialResidual, 1, 1, 1>
+            (data[2 * i], data[2 * i + 1]);
    problem.AddResidualBlock(cost_function, nullptr, &m, &c);
  }
 
@@ -670,8 +674,8 @@
     // the client code.
     static ceres::CostFunction* Create(const double observed_x,
                                        const double observed_y) {
-      return (new ceres::AutoDiffCostFunction<SnavelyReprojectionError, 2, 9, 3>(
-                  new SnavelyReprojectionError(observed_x, observed_y)));
+      return new ceres::AutoDiffCostFunction<SnavelyReprojectionError, 2, 9, 3>
+        (observed_x, observed_y);
     }
 
    double observed_x;
@@ -728,7 +732,7 @@
 
 For a more sophisticated bundle adjustment example which demonstrates
 the use of Ceres' more advanced features including its various linear
-solvers, robust loss functions and local parameterizations see
+solvers, robust loss functions and manifolds see
 `examples/bundle_adjuster.cc
 <https://ceres-solver.googlesource.com/ceres-solver/+/master/examples/bundle_adjuster.cc>`_
 
@@ -862,23 +866,23 @@
    measurement and the predicted measurement is:
 
    .. math:: r_{ab} =
-	     \left[
-	     \begin{array}{c}
-	       R_a^T\left(p_b - p_a\right) - \hat{p}_{ab} \\
-	       \mathrm{Normalize}\left(\psi_b - \psi_a - \hat{\psi}_{ab}\right)
-	     \end{array}
-	     \right]
+             \left[
+             \begin{array}{c}
+               R_a^T\left(p_b - p_a\right) - \hat{p}_{ab} \\
+               \mathrm{Normalize}\left(\psi_b - \psi_a - \hat{\psi}_{ab}\right)
+             \end{array}
+             \right]
 
    where the function :math:`\mathrm{Normalize}()` normalizes the angle in the range
    :math:`[-\pi,\pi)`, and :math:`R` is the rotation matrix given by
 
    .. math:: R_a =
-	     \left[
-	     \begin{array}{cc}
-	       \cos \psi_a & -\sin \psi_a \\
-	       \sin \psi_a & \cos \psi_a \\
-	     \end{array}
-	     \right]
+             \left[
+             \begin{array}{cc}
+               \cos \psi_a & -\sin \psi_a \\
+               \sin \psi_a & \cos \psi_a \\
+             \end{array}
+             \right]
 
    To finish the cost function, we need to weight the residual by the
    uncertainty of the measurement. Hence, we pre-multiply the residual by the
@@ -886,10 +890,12 @@
    i.e. :math:`\Sigma_{ab}^{-\frac{1}{2}} r_{ab}` where :math:`\Sigma_{ab}` is
    the covariance.
 
-   Lastly, we use a local parameterization to normalize the orientation in the
-   range which is normalized between :math:`[-\pi,\pi)`.  Specially, we define
-   the :member:`AngleLocalParameterization::operator()` function to be:
-   :math:`\mathrm{Normalize}(\psi + \delta \psi)`.
+   Lastly, we use a manifold to normalize the orientation in the range
+   :math:`[-\pi,\pi)`.  Specially, we define the
+   :member:`AngleManifold::Plus()` function to be:
+   :math:`\mathrm{Normalize}(\psi + \Delta)` and
+   :member:`AngleManifold::Minus()` function to be
+   :math:`\mathrm{Normalize}(y) - \mathrm{Normalize}(x)`.
 
    This package includes an executable :member:`pose_graph_2d` that will read a
    problem definition file. This executable can work with any 2D problem
@@ -980,17 +986,18 @@
    i.e. :math:`\Sigma_{ab}^{-\frac{1}{2}} r_{ab}` where :math:`\Sigma_{ab}` is
    the covariance.
 
-   Given that we are using a quaternion to represent the orientation, we need to
-   use a local parameterization (:class:`EigenQuaternionParameterization`) to
+   Given that we are using a quaternion to represent the orientation,
+   we need to use a manifold (:class:`EigenQuaternionManifold`) to
    only apply updates orthogonal to the 4-vector defining the
-   quaternion. Eigen's quaternion uses a different internal memory layout for
-   the elements of the quaternion than what is commonly used. Specifically,
-   Eigen stores the elements in memory as :math:`[x, y, z, w]` where the real
-   part is last whereas it is typically stored first. Note, when creating an
-   Eigen quaternion through the constructor the elements are accepted in
-   :math:`w`, :math:`x`, :math:`y`, :math:`z` order. Since Ceres operates on
-   parameter blocks which are raw double pointers this difference is important
-   and requires a different parameterization.
+   quaternion. Eigen's quaternion uses a different internal memory
+   layout for the elements of the quaternion than what is commonly
+   used. Specifically, Eigen stores the elements in memory as
+   :math:`[x, y, z, w]` where the real part is last whereas it is
+   typically stored first. Note, when creating an Eigen quaternion
+   through the constructor the elements are accepted in :math:`w`,
+   :math:`x`, :math:`y`, :math:`z` order. Since Ceres operates on
+   parameter blocks which are raw double pointers this difference is
+   important and requires a different parameterization.
 
    This package includes an executable :member:`pose_graph_3d` that will read a
    problem definition file. This executable can work with any 3D problem
diff --git a/docs/source/numerical_derivatives.rst b/docs/source/numerical_derivatives.rst
index 57b46bf..8d7fb3a 100644
--- a/docs/source/numerical_derivatives.rst
+++ b/docs/source/numerical_derivatives.rst
@@ -61,8 +61,7 @@
   }
 
   CostFunction* cost_function =
-    new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
-      new Rat43CostFunctor(x, y));
+    new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(x, y);
 
 This is about the minimum amount of work one can expect to do to
 define the cost function. The only thing that the user needs to do is
@@ -326,7 +325,7 @@
 Compared to the *correct* value :math:`Df(1.0) = 140.73773557129658`,
 :math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
 comparison, the relative error for the central difference formula with
-the same stepsize (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
+the same step size (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
 
 The above tableau is the basis of Ridders' method for numeric
 differentiation. The full implementation is an adaptive scheme that
diff --git a/docs/source/solving_faqs.rst b/docs/source/solving_faqs.rst
index 64604c4..3842e4d 100644
--- a/docs/source/solving_faqs.rst
+++ b/docs/source/solving_faqs.rst
@@ -23,16 +23,13 @@
 
    2. For general sparse problems (i.e., the Jacobian matrix has a
       substantial number of zeros) use
-      ``SPARSE_NORMAL_CHOLESKY``. This requires that you have
-      ``SuiteSparse`` or ``CXSparse`` installed.
+      ``SPARSE_NORMAL_CHOLESKY``.
 
    3. For bundle adjustment problems with up to a hundred or so
       cameras, use ``DENSE_SCHUR``.
 
    4. For larger bundle adjustment problems with sparse Schur
-      Complement/Reduced camera matrices use ``SPARSE_SCHUR``. This
-      requires that you build Ceres with support for ``SuiteSparse``,
-      ``CXSparse`` or Eigen's sparse linear algebra libraries.
+      Complement/Reduced camera matrices use ``SPARSE_SCHUR``.
 
       If you do not have access to these libraries for whatever
       reason, ``ITERATIVE_SCHUR`` with ``SCHUR_JACOBI`` is an
diff --git a/docs/source/version_history.rst b/docs/source/version_history.rst
index 72ae832..50f8353 100644
--- a/docs/source/version_history.rst
+++ b/docs/source/version_history.rst
@@ -1,9 +1,320 @@
+.. default-domain:: cpp
+
+.. highlight:: c++
+
+.. cpp:namespace:: ceres
+
+
 .. _chapter-version-history:
 
 ===============
 Version History
 ===============
 
+2.2.0
+=====
+
+New Features
+------------
+
+#. Substantial improvement to threading performance across the board
+   (Dmitry Korchemkin)
+#. Mixed precision solves + iterative refinement when using ``CUDA`` or
+   CPU based dense linear solvers, or ``EIGEN_SPARSE`` as the sparse
+   linear algebra library. (Sameer Agarwal & Joydeep Biswas)
+#. Cuda based CGNR and preconditioner support (Joydeep Biswas & Sameer
+   Agarwal)
+#. Nested Dissection (``NESDIS``) is now supported as an ordering method
+   in addition to ``AMD``. (Sameer Agarwal, Alex Stewart & Sergiu
+   Deitsch)
+#. **Power Bundle Adjustment** is available as a linear solver and as
+   a preconditioner by the name of ``SCHUR POWER SERIES EXPANSION``
+   (Mark Shachkov).
+#. Generalized Euler Angle conversions (hs293go@)
+
+
+Backward Incompatible API Changes
+---------------------------------
+
+#. :class:`LocalParameterization` has been removed, use
+   :class:`Manifold` instead.
+#. Ceres Solver now requires a C++17 compliant compiler.
+#. Ceres Solver now requires CMake version 3.16 or later.
+#. Ceres Solver now requires SuiteSparse version 4.5.6 or later.
+#. OpenMP and NO_THREADING backends have been removed. C++ threads is
+   how all threading is done.
+#. Support for ``CX_SPARSE`` as a sparse linear algebra backend has
+   been removed. Similar or better performance can be expected from
+   ``Eigen`` as the sparse linear algebra library.
+
+
+Bug Fixes & Minor Changes
+-------------------------
+#. Optimize the computation of the LM diagonal in TinySolver
+#. Improvements to multi-threaded performance for small problems that
+   had regressed due to changes to threading (Dmitrity Korchemkin)
+#. Fix handling of M_PI for MSVC (Sergiu Deitsch)
+#. Add a default value for Solver::Summary::linear_solver_ordering_type (Sameer Agarwal)
+#. Make sure that the code compiles well with CUDA 11 (Dmitriy
+   Korchemkin)
+#. Rework MSVC warning suppression (Sergiu Deitsch)
+#. Add an example for EvaluationCallback (Sameer Agarwal)
+#. Add an example for IterationCallback (Sameer Agarwal)
+#. Add end-to-end BA tests for SCHUR_POWER_SERIES_EXPANSION (Sameer Agarwal)
+#. Update documentation for linear solvers (Sameer Agarwal)
+#. Add an accessor for the CostFunctor in DynamicAutoDiffCostFunction (Sameer Agarwal)
+#. Runtime check for cudaMallocAsync support (Dmitriy Korchemkin)
+#. Remove cuda-memcheck based tests (Sameer Agarwal)
+#. Modernize ``Sphinx`` related CMake handling as well the ``Sphinx``
+   build process in the terminal. (Sergiu Deitsch)
+#. Fix macos ``sprintf`` security related warnings (Sergiu Deitsch)
+#. Lots of Cuda releated build system fixes (Sergiu Deitsch, Dmitriy
+   Korchemkin, Jason Mak)
+#. Improved windows build support (Sergiu Deitsch)
+#. Various documentation fixes (Maxim Smolskiy, Evan Levine)
+#. Improved handling of large Jacobians (Sameer Agarwal)
+#. Improved handling of infinite initial cost (Sameer Agarwal)
+#. Improved traits support for Jets (Sameer Agarwal)
+#. Improved tests for Euler angle conversion routines (@Hs293Go)
+#. Use a std::tuple to store ProductManifold for better efficiency
+   (Sergiu Deitsch)
+#. Allow default construction of ProductManifold when underlying
+   manifolds have default constructors (Sergiu Deitsch)
+#. Move LineManifold and SphereManifold into their own headers (Sameer
+   Agarwal)
+#. Fix a byte vs number of elements error when dealing with CUDA
+   workspace computations (Joydeep Biswas)
+#. Hide and prevent internal symbols from being exported (Sergiu
+   Deitsch)
+#. Switch to imported SuiteSparse, CXSparse & METIS targets.
+#. Improve compilation on Ubuntu 20.04 (Sergiu Deitsch)
+#. Update to using gtest 1.11.0 (Sameer Agarwal)
+#. Fix Euler angle conversion code to not rely on constexpr
+   constrctors for Jets. (Sameer Agarwal)
+#. BlockRandomAccessSparseMatrix now uses a BlockSparseMatrix as
+   storage instead of TripletSparseMatrix. (Dmitriy Korchemkin)
+#. Deduction guide for DynamicAutoDiffCostFunction (Sergiu Deitsch)
+#. Explicit conversions from long to ints (Alexander Ivanov)
+#. Unused code deletion/commenting and code modernization (Alexander
+   Ivanov)
+#. Improve the bazel build & tests (Alexander Ivanov)
+#. Fix a bug in QuaternionRotatePoint introduced by the use of hypot
+   earlier in this release cycle (Jonathan Taylor & Sameer Agarwal)
+#. Lots of GitHub CI improvements (Sergiu Deitsch & Dmitry Korchemkin)
+#. Improve the robustness of the Cuda based dense linear algebra tests
+   (Joydeep Biswas)
+#. Refactor storage & threading support in BlockRandomAccessMatrix and
+   its subclasses (Sameer Agarwal)
+#. Fix a bug in CoordinateDescentMinimizer related to uninitialized
+   variables (Sameer Agarwal)
+#. Remove OpenMP and NO_THREADS backends. (Sameer Agarwal)
+#. Fix version string parsing starting with SuiteSparse 6.0 (Sergiu
+   Deitsch)
+#. Use FindCUDAToolkit for CMake >= 3.17 (Alex Stewart)
+#. Add a const accessor for the Problem::Options struct used by
+   Problem. (Alex Stewart)
+#. Fix a serious performance regression when using SuiteSparse
+   introduced in `d09f7e9d5e
+   <https://github.com/ceres-solver/ceres-solver/commit/d09f7e9d5e3bfab2d7ec7e81fd6a55786edca17a>`_. (Sameer
+   Agarwal)
+#. Fix the build on QNX (Alex Stewart)
+#. Improve testing macros and documentation for Manifolds (Alex
+   Stewart)
+#. Improved code formatting (Tyler Hovanec)
+#. Better use of std::unique_ptr in the code (Mike Vitus)
+#. Fix a memory leak in ContextImpl (Sameer Agarwal)
+#. Faster locking when num_thread = 1 (Sameer Agarwal)
+#. Fix how x_norm is computed in TrustRegionMinimizer (Sameer Agarwal)
+#. Faster JACOBI preconditioner for CGNR (Sameer Agarwal)
+#. Convert internal enums to class enums (Sameer Agarwal)
+#. Improve the code in small_blas to be more compiler friendly (Sameer
+   Agarwal)
+#. Add the ability to specify the pivot threshold in
+   ::class::`Covariance::Options` (Sameer Agarwal)
+#. Modernize the internals to use C++17 (Sameer Agarwal)
+#. Choose SPMV algorithm based on the CUDA SDK Version (Joydeep
+   Biswas)
+#. Better defaults in ``bundle_adjuster.cc`` (Sameer Agarwal)
+#. Use ``foo.data()`` instead of ``&foo[0]`` (Sameer Agarwal)
+#. Fix GCC 12.1.1 LTO -Walloc-size-larger-than= warnings (Sergiu
+   Deitsch)
+#. Improved determinism in tests by re-using the same PRNG (Sergiu
+   Deitsch)
+#. Improved docs for ``vcpkg`` installation. (Sergiu Deitsch)
+#. Update FindGlog.cmake to create glog::glog target (KrisThielemans@)
+#. Improve consistency & correctness of Sphere & Line Manifolds
+   (Julio L. Paneque)
+#. Remove ``ceres/internal/random.h`` in favor of ``<random>``.
+#. Fix a crash in ``InnerProductComputer`` (Sameer Agarwal)
+#. Various fixes to improve compilation on windows using MinGW & MSVC
+   (Sergiu Deitsch)
+#. Fix fmin/fmax() to use Jet averaging on equality (Alex Stewart)
+#. Fix use of conditional preprocessor checks within a macro in tests
+   (Alex Stewart)
+#. Better support for ``CUDA memcheck`` (Joydeep Biswas)
+#. Improve the logic for linking to the platform specific threading
+   library (Sergiu Deitsch)
+#. Generate the version string at compile time (Sergiu Deitsch)
+#. :class:`NumericDiffFirstOrderFunction` can now take a dynamically
+   sized parameter vector. (Sameer Agarwal)
+#. Fix compilation with SuiteSparse 7.2.0 (Mark Shackov)
+
+2.1.0
+=====
+
+New Features
+------------
+
+#. Support for CUDA based dense solvers - ``DENSE_QR``,
+   ``DENSE_NORMAL_CHOLESKY`` & ``DENSE_SCHUR`` (Joydeep Biswas, Sameer
+   Agarwal)
+
+#. :class:`Manifold` is the new
+   :class:`LocalParameterization`. Version 2.1 is the transition
+   release where users can use both :class:`LocalParameterization` as
+   well as :class:`Manifold` objects as they transition from the
+   former to the latter. :class:`LocalParameterization` will be
+   removed in version 2.2. There should be no numerical change to the
+   results as a result of this change. (Sameer Agarwal, Johannes Beck,
+   Sergiu Deitsch)
+
+#. A number of changes to :class:`Jet` s (Sergiu Deitsch)
+
+   * :class:`Jet` gained support for, ``copysign``, ``fma`` (fused
+     multiply-add), ``midpoint`` (C++20 and above), ``lerp`` (C++20
+     and above), 3-argument ``hypot`` (C++17 and above), ``log10``,
+     ``log1p``, ``exp1m``, ``norm`` (squared :math:`L^2` norm).
+
+   * Quiet floating-point comparison: ``isless``, ``isgreater``,
+     ``islessgreater``, ``islessequal``, ``isgreaterequal``,
+     ``isunordered``, ``signbit``, ``fdim``
+
+   * Categorization and comparison operations are applied exclusively
+     and consistently to the scalar part of a Jet now: ``isnan``,
+     ``isinf``, ``isnormal``, ``isfinite``, ``fpclassify`` (new),
+     ``fmin``, ``fmax``
+
+   * It is now possible to safely compare a :class:`Jet` against a scalar
+     (or literal) without constructing a :class:`Jet` first (even if it's
+     nested):
+
+     .. code-block:: c++
+
+        Jet<Jet<Jet<T, N>, M>, O> x;
+        if (x == 2) { } // equivalent to x.a.a.a == 2
+
+
+     This enables interaction with various arithmetic functions that
+     expect a scalar like instance, such as ``boost::math::pow<-N>``
+     for reciprocal computation.
+
+#. Add :class:`NumericDiffFirstOrderFunction` (Sameer Agarwal)
+
+
+Backward Incompatible API Changes
+---------------------------------
+
+#. :class:`LocalParameterization` is deprecated. It will be removed in
+   version 2.2. Use :class:`Manifold` instead.
+#. Classification functions like ``IsFinite`` are deprecated. Use the
+   ``C++11`` functions (``isfinite``, ``isnan`` etc) going
+   forward. However to maintain consistent behaviour with comparison
+   operators, these functions only inspect the scalar part of the
+   :class:`Jet`.
+
+Bug Fixes & Minor Changes
+-------------------------
+
+#. Worked around an MSVC ordering bug when using C++17/20 (Sergiu
+   Deitsch)
+#. Added a CITATION.cff file. (Sergiu Deitsch)
+#. Updated included gtest version to 1.11.0. This should fix some
+   ``C++20`` compilation problems. (Sameer Agarwal).
+#. Workaround ``MSVC`` ``STL`` deficiency in ``C++17`` mode (Sergiu
+   Deitsch)
+#. Fix ``Jet`` test failures on ``ARMv8`` with recent ``Xcode``
+   (Sergiu Deitsch)
+#. Fix unused arguments of ``Make1stOrderPerturbation`` (Dmitriy
+   Korchemkin)
+#. Fix ``SuiteSparse`` path and version reporting (Sergiu Deitsch)
+#. Enable `GitHub` workflows and deprecate ``TravisCI`` (Sergiu
+   Deitsch)
+#. Add missing includes (Sergiu Deitsch, Sameer Agarwal)
+#. Fix path for ``cuda-memcheck`` tests (Joydeep Biswas)
+#. ClangFormat cleanup (Sameer Agarwal)
+#. Set ``CMP0057`` policy for ``IN_LIST`` operator in
+   ``FindSuiteSparse.cmake`` (Brent Yi)
+#. Do not define unusable import targets (Sergiu Deitsch)
+#. Fix Ubuntu 18.04 shared library build (Sergiu Deitsch)
+#. Force ``C++`` linker when building the ``C`` API (Sergiu Deitsch)
+#. Modernize the code to be inline with ``C++14`` (Sergiu Deitsch,
+   Sameer Agarwal)
+#. Lots of fixes to make Ceres compile out of the box on Windows
+   (Sergiu Deitsch)
+#. Standardize path handling using ``GNUImstallDirs`` (Sergiu Deitsch)
+#. Add final specifier to classes to help the compiler with
+   devirtualization (Sameer Agarwal)
+#. LOTs of clean & modernization of the CMake build files (Sergiu
+   Deitsch & Alex Stewart)
+#. Simplification to the symbol export logic (Sergiu Deitsch)
+#. Add cmake option ``ENABLE_BITCODE`` for iOS builds (John Harrison)
+#. Add const accessor for functor wrapped by auto/numeric-diff objects
+   (Alex Stewart)
+#. Cleanup & refactor ``jet_test.cc``. (Sameer Agarwal)
+#. Fix docs of supported sparse backends for mixed precision solvers
+   (Alex Stewart)
+#. Fix C++20 compilation (Sergiu Deitsch)
+#. Add an example for ``BiCubicInterpolator`` (Dmitriy Korcchemkin)
+#. Add a section to the documentation on implicit and inverse function
+   theorems (Sameer Agarwal)
+#. Add a note about Trigg's correction (Sameer Agarwal)
+#. Fix the docs for ``Problem::RemoveResidualBlock`` &
+   ``Problem::RemoveParameterBlock`` (Sameer Agarwal)
+#. Fix an incorrect check in ``reorder_program.cc`` (William Gandler)
+#. Add ``function_tolerance`` based convergence testing to ``TinySolver``
+   (Sameer Agarwal).
+#. Fix a number of typos in ``rotation.h`` (@yiping)
+#. Fix a typo in ``interfacing_with_autodiff.rst`` (@tangobravo)
+#. Fix a matrix sizing bug in covariance_impl.cc (William Gandler)
+#. Fix a bug in ``system_test.cc`` (William Gandler)
+#. Fix the Jacobian computation in ``trust_region_minimizer_test.cc``
+   (William Gandler)
+#. Fix a bug in ``local_parameterization_test.cc`` (William Gandler)
+#. Add accessors to ``GradientProblem`` (Sameer Agarwal)
+#. Refactor ``small_blas_gemm_benchmark`` (Ahmed Taei)
+#. Refactor ``small_blas_test`` (Ahmed Taei)
+#. Fix dependency check for building documentation (Sumit Dey)
+#. Fix an errant double link in the docs (Timon Knigge)
+#. Fix a typo in the version history (Noah Snavely)
+#. Fix typo in LossFunctionWrapper sample code (Dmitriy Korchemkin)
+#. Add fmax/fmin overloads for scalars (Alex Karatarakis)
+#. Introduce benchmarks for ``Jet`` operations (Alexander Karatarakis)
+#. Fix typos in documentation and fix the documentation for
+   ``IterationSummary`` (Alexander Karatarakis)
+#. Do not check MaxNumThreadsAvailable if the thread number is set
+   to 1. (Fuhao Shi)
+#. Add a macro ``CERES_GET_FLAG``. (Sameer Agarwal)
+#. Reduce log spam in ``covariance_impl.cc`` (Daniel Henell)
+#. Fix FindTBB version detection with TBB >= 2021.1.1 (Alex Stewart)
+#. Fix Eigen3_VERSION (Florian Berchtold)
+#. Allow Unity Build (Tobias Schluter)
+#. Make miniglog's InitGoogleLogging argument const (Tobias Schluter)
+#. Use portable expression for constant 2/sqrt(pi) (Tobias Schluter)
+#. Fix a number of compile errors related (Austin Schuch)
+
+   * ``format not a string literal``
+   * ``-Wno-maybe-uninitialized error``
+   * ``nonnull arg compared to NULL``
+   * ``-Wno-format-nonliteral``
+   * ``-Wmissing-field-initializers``
+   * ``-Werror``
+
+#. Fix ``cc_binary`` includes so examples build as an external repo
+   (Austin Schuh)
+#. Fix an explicit double in TinySolver (Bogdan Burlacu)
+#. Fix unit quaternion rotation (Mykyta Kozlov)
+
+
 2.0.0
 =====
 
@@ -1532,8 +1843,8 @@
 =======
 
 Ceres Solver grew out of the need for general least squares solving at
-Google. In early 2010, Sameer Agarwal and Fredrik Schaffalitzky
-started the development of Ceres Solver. Fredrik left Google shortly
+Google. In early 2010, Sameer Agarwal and Frederik Schaffalitzky
+started the development of Ceres Solver. Frederik left Google shortly
 thereafter and Keir Mierle stepped in to take his place. After two
 years of on-and-off development, Ceres Solver was released as open
 source in May of 2012.
