blob: 2a3a4048244dd7166bd88387059039a50d9babec [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001.. default-domain:: cpp
2
3.. cpp:namespace:: ceres
4
5.. _chapter-analytical_derivatives:
6
7====================
8Analytic Derivatives
9====================
10
11Consider the problem of fitting the following curve (`Rat43
12<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
13data:
14
15.. math::
16 y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
17
18That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
19determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
20fit this data.
21
22Which can be stated as the problem of finding the
23values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
24minimize the following objective function [#f1]_:
25
26.. math::
27 \begin{align}
28 E(b_1, b_2, b_3, b_4)
29 &= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
30 &= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
31 \end{align}
32
33To solve this problem using Ceres Solver, we need to define a
34:class:`CostFunction` that computes the residual :math:`f` for a given
35:math:`x` and :math:`y` and its derivatives with respect to
36:math:`b_1, b_2, b_3` and :math:`b_4`.
37
38Using elementary differential calculus, we can see that:
39
40.. math::
41 \begin{align}
42 D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
43 D_2 f(b_1, b_2, b_3, b_4; x,y) &=
44 \frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
45 D_3 f(b_1, b_2, b_3, b_4; x,y) &=
46 \frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
47 D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
48 \end{align}
49
50With these derivatives in hand, we can now implement the
51:class:`CostFunction` as:
52
53.. code-block:: c++
54
55 class Rat43Analytic : public SizedCostFunction<1,4> {
56 public:
57 Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
58 virtual ~Rat43Analytic() {}
59 virtual bool Evaluate(double const* const* parameters,
60 double* residuals,
61 double** jacobians) const {
62 const double b1 = parameters[0][0];
63 const double b2 = parameters[0][1];
64 const double b3 = parameters[0][2];
65 const double b4 = parameters[0][3];
66
67 residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
68
69 if (!jacobians) return true;
70 double* jacobian = jacobians[0];
71 if (!jacobian) return true;
72
73 jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
74 jacobian[1] = -b1 * exp(b2 - b3 * x_) *
75 pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
76 jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
77 pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
78 jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
79 pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
80 return true;
81 }
82
83 private:
84 const double x_;
85 const double y_;
86 };
87
88This is tedious code, hard to read and with a lot of
89redundancy. So in practice we will cache some sub-expressions to
90improve its efficiency, which would give us something like:
91
92.. code-block:: c++
93
94 class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
95 public:
96 Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
97 virtual ~Rat43AnalyticOptimized() {}
98 virtual bool Evaluate(double const* const* parameters,
99 double* residuals,
100 double** jacobians) const {
101 const double b1 = parameters[0][0];
102 const double b2 = parameters[0][1];
103 const double b3 = parameters[0][2];
104 const double b4 = parameters[0][3];
105
106 const double t1 = exp(b2 - b3 * x_);
107 const double t2 = 1 + t1;
108 const double t3 = pow(t2, -1.0 / b4);
109 residuals[0] = b1 * t3 - y_;
110
111 if (!jacobians) return true;
112 double* jacobian = jacobians[0];
113 if (!jacobian) return true;
114
115 const double t4 = pow(t2, -1.0 / b4 - 1);
116 jacobian[0] = t3;
117 jacobian[1] = -b1 * t1 * t4 / b4;
118 jacobian[2] = -x_ * jacobian[1];
119 jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
120 return true;
121 }
122
123 private:
124 const double x_;
125 const double y_;
126 };
127
128What is the difference in performance of these two implementations?
129
130========================== =========
131CostFunction Time (ns)
132========================== =========
133Rat43Analytic 255
134Rat43AnalyticOptimized 92
135========================== =========
136
137``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
138``Rat43Analytic``. This difference in run-time is not uncommon. To
139get the best performance out of analytically computed derivatives, one
140usually needs to optimize the code to account for common
141sub-expressions.
142
143
144When should you use analytical derivatives?
145===========================================
146
147#. The expressions are simple, e.g. mostly linear.
148
149#. A computer algebra system like `Maple
150 <https://www.maplesoft.com/products/maple/>`_ , `Mathematica
151 <https://www.wolfram.com/mathematica/>`_, or `SymPy
152 <http://www.sympy.org/en/index.html>`_ can be used to symbolically
153 differentiate the objective function and generate the C++ to
154 evaluate them.
155
156#. Performance is of utmost concern and there is algebraic structure
157 in the terms that you can exploit to get better performance than
158 automatic differentiation.
159
160 That said, getting the best performance out of analytical
161 derivatives requires a non-trivial amount of work. Before going
162 down this path, it is useful to measure the amount of time being
163 spent evaluating the Jacobian as a fraction of the total solve time
164 and remember `Amdahl's Law
165 <https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
166
167#. There is no other way to compute the derivatives, e.g. you
168 wish to compute the derivative of the root of a polynomial:
169
170 .. math::
171 a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
172
173
174 with respect to :math:`x` and :math:`y`. This requires the use of
175 the `Inverse Function Theorem
176 <https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
177
178#. You love the chain rule and actually enjoy doing all the algebra by
179 hand.
180
181
182.. rubric:: Footnotes
183
184.. [#f1] The notion of best fit depends on the choice of the objective
185 function used to measure the quality of fit, which in turn
186 depends on the underlying noise process which generated the
187 observations. Minimizing the sum of squared differences is
188 the right thing to do when the noise is `Gaussian
189 <https://en.wikipedia.org/wiki/Normal_distribution>`_. In
190 that case the optimal value of the parameters is the `Maximum
191 Likelihood Estimate
192 <https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.