Austin Schuh | 70cc955 | 2019-01-21 19:46:48 -0800 | [diff] [blame] | 1 | .. default-domain:: cpp |
| 2 | |
| 3 | .. cpp:namespace:: ceres |
| 4 | |
| 5 | .. _chapter-analytical_derivatives: |
| 6 | |
| 7 | ==================== |
| 8 | Analytic Derivatives |
| 9 | ==================== |
| 10 | |
| 11 | Consider the problem of fitting the following curve (`Rat43 |
| 12 | <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to |
| 13 | data: |
| 14 | |
| 15 | .. math:: |
| 16 | y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}} |
| 17 | |
| 18 | That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`, |
| 19 | determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best |
| 20 | fit this data. |
| 21 | |
| 22 | Which can be stated as the problem of finding the |
| 23 | values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that |
| 24 | minimize the following objective function [#f1]_: |
| 25 | |
| 26 | .. math:: |
| 27 | \begin{align} |
| 28 | E(b_1, b_2, b_3, b_4) |
| 29 | &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\ |
| 30 | &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\ |
| 31 | \end{align} |
| 32 | |
| 33 | To solve this problem using Ceres Solver, we need to define a |
| 34 | :class:`CostFunction` that computes the residual :math:`f` for a given |
| 35 | :math:`x` and :math:`y` and its derivatives with respect to |
| 36 | :math:`b_1, b_2, b_3` and :math:`b_4`. |
| 37 | |
| 38 | Using elementary differential calculus, we can see that: |
| 39 | |
| 40 | .. math:: |
| 41 | \begin{align} |
| 42 | D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\ |
| 43 | D_2 f(b_1, b_2, b_3, b_4; x,y) &= |
| 44 | \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ |
| 45 | D_3 f(b_1, b_2, b_3, b_4; x,y) &= |
| 46 | \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\ |
| 47 | D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}} |
| 48 | \end{align} |
| 49 | |
| 50 | With these derivatives in hand, we can now implement the |
| 51 | :class:`CostFunction` as: |
| 52 | |
| 53 | .. code-block:: c++ |
| 54 | |
| 55 | class Rat43Analytic : public SizedCostFunction<1,4> { |
| 56 | public: |
| 57 | Rat43Analytic(const double x, const double y) : x_(x), y_(y) {} |
| 58 | virtual ~Rat43Analytic() {} |
| 59 | virtual bool Evaluate(double const* const* parameters, |
| 60 | double* residuals, |
| 61 | double** jacobians) const { |
| 62 | const double b1 = parameters[0][0]; |
| 63 | const double b2 = parameters[0][1]; |
| 64 | const double b3 = parameters[0][2]; |
| 65 | const double b4 = parameters[0][3]; |
| 66 | |
| 67 | residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_; |
| 68 | |
| 69 | if (!jacobians) return true; |
| 70 | double* jacobian = jacobians[0]; |
| 71 | if (!jacobian) return true; |
| 72 | |
| 73 | jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4); |
| 74 | jacobian[1] = -b1 * exp(b2 - b3 * x_) * |
| 75 | pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4; |
| 76 | jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) * |
| 77 | pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4; |
| 78 | jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) * |
| 79 | pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4); |
| 80 | return true; |
| 81 | } |
| 82 | |
| 83 | private: |
| 84 | const double x_; |
| 85 | const double y_; |
| 86 | }; |
| 87 | |
| 88 | This is tedious code, hard to read and with a lot of |
| 89 | redundancy. So in practice we will cache some sub-expressions to |
| 90 | improve its efficiency, which would give us something like: |
| 91 | |
| 92 | .. code-block:: c++ |
| 93 | |
| 94 | class Rat43AnalyticOptimized : public SizedCostFunction<1,4> { |
| 95 | public: |
| 96 | Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {} |
| 97 | virtual ~Rat43AnalyticOptimized() {} |
| 98 | virtual bool Evaluate(double const* const* parameters, |
| 99 | double* residuals, |
| 100 | double** jacobians) const { |
| 101 | const double b1 = parameters[0][0]; |
| 102 | const double b2 = parameters[0][1]; |
| 103 | const double b3 = parameters[0][2]; |
| 104 | const double b4 = parameters[0][3]; |
| 105 | |
| 106 | const double t1 = exp(b2 - b3 * x_); |
| 107 | const double t2 = 1 + t1; |
| 108 | const double t3 = pow(t2, -1.0 / b4); |
| 109 | residuals[0] = b1 * t3 - y_; |
| 110 | |
| 111 | if (!jacobians) return true; |
| 112 | double* jacobian = jacobians[0]; |
| 113 | if (!jacobian) return true; |
| 114 | |
| 115 | const double t4 = pow(t2, -1.0 / b4 - 1); |
| 116 | jacobian[0] = t3; |
| 117 | jacobian[1] = -b1 * t1 * t4 / b4; |
| 118 | jacobian[2] = -x_ * jacobian[1]; |
| 119 | jacobian[3] = b1 * log(t2) * t3 / (b4 * b4); |
| 120 | return true; |
| 121 | } |
| 122 | |
| 123 | private: |
| 124 | const double x_; |
| 125 | const double y_; |
| 126 | }; |
| 127 | |
| 128 | What is the difference in performance of these two implementations? |
| 129 | |
| 130 | ========================== ========= |
| 131 | CostFunction Time (ns) |
| 132 | ========================== ========= |
| 133 | Rat43Analytic 255 |
| 134 | Rat43AnalyticOptimized 92 |
| 135 | ========================== ========= |
| 136 | |
| 137 | ``Rat43AnalyticOptimized`` is :math:`2.8` times faster than |
| 138 | ``Rat43Analytic``. This difference in run-time is not uncommon. To |
| 139 | get the best performance out of analytically computed derivatives, one |
| 140 | usually needs to optimize the code to account for common |
| 141 | sub-expressions. |
| 142 | |
| 143 | |
| 144 | When should you use analytical derivatives? |
| 145 | =========================================== |
| 146 | |
| 147 | #. The expressions are simple, e.g. mostly linear. |
| 148 | |
| 149 | #. A computer algebra system like `Maple |
| 150 | <https://www.maplesoft.com/products/maple/>`_ , `Mathematica |
| 151 | <https://www.wolfram.com/mathematica/>`_, or `SymPy |
| 152 | <http://www.sympy.org/en/index.html>`_ can be used to symbolically |
| 153 | differentiate the objective function and generate the C++ to |
| 154 | evaluate them. |
| 155 | |
| 156 | #. Performance is of utmost concern and there is algebraic structure |
| 157 | in the terms that you can exploit to get better performance than |
| 158 | automatic differentiation. |
| 159 | |
| 160 | That said, getting the best performance out of analytical |
| 161 | derivatives requires a non-trivial amount of work. Before going |
| 162 | down this path, it is useful to measure the amount of time being |
| 163 | spent evaluating the Jacobian as a fraction of the total solve time |
| 164 | and remember `Amdahl's Law |
| 165 | <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend. |
| 166 | |
| 167 | #. There is no other way to compute the derivatives, e.g. you |
| 168 | wish to compute the derivative of the root of a polynomial: |
| 169 | |
| 170 | .. math:: |
| 171 | a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0 |
| 172 | |
| 173 | |
| 174 | with respect to :math:`x` and :math:`y`. This requires the use of |
| 175 | the `Inverse Function Theorem |
| 176 | <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_ |
| 177 | |
| 178 | #. You love the chain rule and actually enjoy doing all the algebra by |
| 179 | hand. |
| 180 | |
| 181 | |
| 182 | .. rubric:: Footnotes |
| 183 | |
| 184 | .. [#f1] The notion of best fit depends on the choice of the objective |
| 185 | function used to measure the quality of fit, which in turn |
| 186 | depends on the underlying noise process which generated the |
| 187 | observations. Minimizing the sum of squared differences is |
| 188 | the right thing to do when the noise is `Gaussian |
| 189 | <https://en.wikipedia.org/wiki/Normal_distribution>`_. In |
| 190 | that case the optimal value of the parameters is the `Maximum |
| 191 | Likelihood Estimate |
| 192 | <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_. |