Squashed 'third_party/ceres/' content from commit e51e9b4

Change-Id: I763587619d57e594d3fa158dc3a7fe0b89a1743b
git-subtree-dir: third_party/ceres
git-subtree-split: e51e9b46f6ca88ab8b2266d0e362771db6d98067
diff --git a/docs/source/analytical_derivatives.rst b/docs/source/analytical_derivatives.rst
new file mode 100644
index 0000000..2a3a404
--- /dev/null
+++ b/docs/source/analytical_derivatives.rst
@@ -0,0 +1,192 @@
+.. default-domain:: cpp
+
+.. cpp:namespace:: ceres
+
+.. _chapter-analytical_derivatives:
+
+====================
+Analytic Derivatives
+====================
+
+Consider the problem of fitting the following curve (`Rat43
+<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
+data:
+
+.. math::
+  y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
+
+That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
+determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
+fit this data.
+
+Which can be stated as the problem of finding the
+values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
+minimize the following objective function [#f1]_:
+
+.. math::
+   \begin{align}
+   E(b_1, b_2, b_3, b_4)
+   &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
+   &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
+   \end{align}
+
+To solve this problem using Ceres Solver, we need to define a
+:class:`CostFunction` that computes the residual :math:`f` for a given
+:math:`x` and :math:`y` and its derivatives with respect to
+:math:`b_1, b_2, b_3` and :math:`b_4`.
+
+Using elementary differential calculus, we can see that:
+
+.. math::
+  \begin{align}
+  D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
+  D_2 f(b_1, b_2, b_3, b_4; x,y) &=
+  \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
+  D_3 f(b_1, b_2, b_3, b_4; x,y) &=
+  \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
+  D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1  \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
+  \end{align}
+
+With these derivatives in hand, we can now implement the
+:class:`CostFunction` as:
+
+.. code-block:: c++
+
+  class Rat43Analytic : public SizedCostFunction<1,4> {
+     public:
+       Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
+       virtual ~Rat43Analytic() {}
+       virtual bool Evaluate(double const* const* parameters,
+                             double* residuals,
+                             double** jacobians) const {
+         const double b1 = parameters[0][0];
+         const double b2 = parameters[0][1];
+         const double b3 = parameters[0][2];
+         const double b4 = parameters[0][3];
+
+         residuals[0] = b1 *  pow(1 + exp(b2 -  b3 * x_), -1.0 / b4) - y_;
+
+         if (!jacobians) return true;
+         double* jacobian = jacobians[0];
+         if (!jacobian) return true;
+
+         jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
+         jacobian[1] = -b1 * exp(b2 - b3 * x_) *
+                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
+         jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
+                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
+         jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
+                       pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
+         return true;
+       }
+
+      private:
+       const double x_;
+       const double y_;
+   };
+
+This is tedious code, hard to read and with a lot of
+redundancy. So in practice we will cache some sub-expressions to
+improve its efficiency, which would give us something like:
+
+.. code-block:: c++
+
+  class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
+     public:
+       Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
+       virtual ~Rat43AnalyticOptimized() {}
+       virtual bool Evaluate(double const* const* parameters,
+                             double* residuals,
+                             double** jacobians) const {
+         const double b1 = parameters[0][0];
+         const double b2 = parameters[0][1];
+         const double b3 = parameters[0][2];
+         const double b4 = parameters[0][3];
+
+         const double t1 = exp(b2 -  b3 * x_);
+         const double t2 = 1 + t1;
+         const double t3 = pow(t2, -1.0 / b4);
+         residuals[0] = b1 * t3 - y_;
+
+         if (!jacobians) return true;
+         double* jacobian = jacobians[0];
+         if (!jacobian) return true;
+
+         const double t4 = pow(t2, -1.0 / b4 - 1);
+         jacobian[0] = t3;
+         jacobian[1] = -b1 * t1 * t4 / b4;
+         jacobian[2] = -x_ * jacobian[1];
+         jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
+         return true;
+       }
+
+     private:
+       const double x_;
+       const double y_;
+   };
+
+What is the difference in performance of these two implementations?
+
+==========================   =========
+CostFunction                 Time (ns)
+==========================   =========
+Rat43Analytic                      255
+Rat43AnalyticOptimized              92
+==========================   =========
+
+``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
+``Rat43Analytic``.  This difference in run-time is not uncommon. To
+get the best performance out of analytically computed derivatives, one
+usually needs to optimize the code to account for common
+sub-expressions.
+
+
+When should you use analytical derivatives?
+===========================================
+
+#. The expressions are simple, e.g. mostly linear.
+
+#. A computer algebra system like `Maple
+   <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
+   <https://www.wolfram.com/mathematica/>`_, or `SymPy
+   <http://www.sympy.org/en/index.html>`_ can be used to symbolically
+   differentiate the objective function and generate the C++ to
+   evaluate them.
+
+#. Performance is of utmost concern and there is algebraic structure
+   in the terms that you can exploit to get better performance than
+   automatic differentiation.
+
+   That said, getting the best performance out of analytical
+   derivatives requires a non-trivial amount of work.  Before going
+   down this path, it is useful to measure the amount of time being
+   spent evaluating the Jacobian as a fraction of the total solve time
+   and remember `Amdahl's Law
+   <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
+
+#. There is no other way to compute the derivatives, e.g. you
+   wish to compute the derivative of the root of a polynomial:
+
+   .. math::
+     a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
+
+
+   with respect to :math:`x` and :math:`y`. This requires the use of
+   the `Inverse Function Theorem
+   <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
+
+#. You love the chain rule and actually enjoy doing all the algebra by
+   hand.
+
+
+.. rubric:: Footnotes
+
+.. [#f1] The notion of best fit depends on the choice of the objective
+         function used to measure the quality of fit, which in turn
+         depends on the underlying noise process which generated the
+         observations. Minimizing the sum of squared differences is
+         the right thing to do when the noise is `Gaussian
+         <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
+         that case the optimal value of the parameters is the `Maximum
+         Likelihood Estimate
+         <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.