blob: 0c48c808150958602dcc1ab747d9878ff0619179 [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001.. default-domain:: cpp
2
3.. cpp:namespace:: ceres
4
5.. _chapter-automatic_derivatives:
6
7=====================
8Automatic Derivatives
9=====================
10
11We will now consider automatic differentiation. It is a technique that
12can compute exact derivatives, fast, while requiring about the same
13effort from the user as is needed to use numerical differentiation.
14
15Don't believe me? Well here goes. The following code fragment
16implements an automatically differentiated ``CostFunction`` for `Rat43
17<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
18
19.. code-block:: c++
20
21 struct Rat43CostFunctor {
22 Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
23
24 template <typename T>
25 bool operator()(const T* parameters, T* residuals) const {
26 const T b1 = parameters[0];
27 const T b2 = parameters[1];
28 const T b3 = parameters[2];
29 const T b4 = parameters[3];
30 residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
31 return true;
32 }
33
34 private:
35 const double x_;
36 const double y_;
37 };
38
39
40 CostFunction* cost_function =
41 new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
42 new Rat43CostFunctor(x, y));
43
44Notice that compared to numeric differentiation, the only difference
45when defining the functor for use with automatic differentiation is
46the signature of the ``operator()``.
47
48In the case of numeric differentiation it was
49
50.. code-block:: c++
51
52 bool operator()(const double* parameters, double* residuals) const;
53
54and for automatic differentiation it is a templated function of the
55form
56
57.. code-block:: c++
58
59 template <typename T> bool operator()(const T* parameters, T* residuals) const;
60
61
62So what does this small change buy us? The following table compares
63the time it takes to evaluate the residual and the Jacobian for
64`Rat43` using various methods.
65
66========================== =========
67CostFunction Time (ns)
68========================== =========
69Rat43Analytic 255
70Rat43AnalyticOptimized 92
71Rat43NumericDiffForward 262
72Rat43NumericDiffCentral 517
73Rat43NumericDiffRidders 3760
74Rat43AutomaticDiff 129
75========================== =========
76
77We can get exact derivatives using automatic differentiation
78(``Rat43AutomaticDiff``) with about the same effort that is required
79to write the code for numeric differentiation but only :math:`40\%`
80slower than hand optimized analytical derivatives.
81
82So how does it work? For this we will have to learn about **Dual
83Numbers** and **Jets** .
84
85
86Dual Numbers & Jets
87===================
88
89.. NOTE::
90
91 Reading this and the next section on implementing Jets is not
92 necessary to use automatic differentiation in Ceres Solver. But
93 knowing the basics of how Jets work is useful when debugging and
94 reasoning about the performance of automatic differentiation.
95
96Dual numbers are an extension of the real numbers analogous to complex
97numbers: whereas complex numbers augment the reals by introducing an
98imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
99numbers introduce an *infinitesimal* unit :math:`\epsilon` such that
100:math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
101components, the *real* component :math:`a` and the *infinitesimal*
102component :math:`v`.
103
104Surprisingly, this simple change leads to a convenient method for
105computing exact derivatives without needing to manipulate complicated
106symbolic expressions.
107
108For example, consider the function
109
110.. math::
111
112 f(x) = x^2 ,
113
114Then,
115
116.. math::
117
118 \begin{align}
119 f(10 + \epsilon) &= (10 + \epsilon)^2\\
120 &= 100 + 20 \epsilon + \epsilon^2\\
121 &= 100 + 20 \epsilon
122 \end{align}
123
124Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
12520`. Indeed this generalizes to functions which are not
126polynomial. Consider an arbitrary differentiable function
127:math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
128considering the Taylor expansion of :math:`f` near :math:`x`, which
129gives us the infinite series
130
131.. math::
132 \begin{align}
133 f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
134 \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
135 f(x + \epsilon) &= f(x) + Df(x) \epsilon
136 \end{align}
137
138Here we are using the fact that :math:`\epsilon^2 = 0`.
139
140A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
141:math:`n`-dimensional dual number, where we augment the real numbers
142with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
143the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
144a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional
145*infinitesimal* part :math:`\mathbf{v}`, i.e.,
146
147.. math::
148 x = a + \sum_j v_{j} \epsilon_j
149
150The summation notation gets tedious, so we will also just write
151
152.. math::
153 x = a + \mathbf{v}.
154
155where the :math:`\epsilon_i`'s are implicit. Then, using the same
156Taylor series expansion used above, we can see that:
157
158.. math::
159
160 f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
161
162Similarly for a multivariate function
163:math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
164:math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
165
166.. math::
167 f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
168
169So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
170standard basis vector, then, the above expression would simplify to
171
172.. math::
173 f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
174
175and we can extract the coordinates of the Jacobian by inspecting the
176coefficients of :math:`\epsilon_i`.
177
178Implementing Jets
179-----------------
180
181In order for the above to work in practice, we will need the ability
182to evaluate an arbitrary function :math:`f` not just on real numbers
183but also on dual numbers, but one does not usually evaluate functions
184by evaluating their Taylor expansions,
185
186This is where C++ templates and operator overloading comes into
187play. The following code fragment has a simple implementation of a
188``Jet`` and some operators/functions that operate on them.
189
190.. code-block:: c++
191
192 template<int N> struct Jet {
193 double a;
194 Eigen::Matrix<double, 1, N> v;
195 };
196
197 template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
198 return Jet<N>(f.a + g.a, f.v + g.v);
199 }
200
201 template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
202 return Jet<N>(f.a - g.a, f.v - g.v);
203 }
204
205 template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
206 return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
207 }
208
209 template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
210 return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
211 }
212
213 template <int N> Jet<N> exp(const Jet<N>& f) {
214 return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
215 }
216
217 // This is a simple implementation for illustration purposes, the
218 // actual implementation of pow requires careful handling of a number
219 // of corner cases.
220 template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
221 return Jet<N>(pow(f.a, g.a),
222 g.a * pow(f.a, g.a - 1.0) * f.v +
223 pow(f.a, g.a) * log(f.a); * g.v);
224 }
225
226
227With these overloaded functions in hand, we can now call
228``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
229that together with appropriately initialized Jets allows us to compute
230the Jacobian as follows:
231
232.. code-block:: c++
233
234 class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
235 public:
236 Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
237 virtual ~Rat43Automatic() {}
238 virtual bool Evaluate(double const* const* parameters,
239 double* residuals,
240 double** jacobians) const {
241 // Just evaluate the residuals if Jacobians are not required.
242 if (!jacobians) return (*functor_)(parameters[0], residuals);
243
244 // Initialize the Jets
245 ceres::Jet<4> jets[4];
246 for (int i = 0; i < 4; ++i) {
247 jets[i].a = parameters[0][i];
248 jets[i].v.setZero();
249 jets[i].v[i] = 1.0;
250 }
251
252 ceres::Jet<4> result;
253 (*functor_)(jets, &result);
254
255 // Copy the values out of the Jet.
256 residuals[0] = result.a;
257 for (int i = 0; i < 4; ++i) {
258 jacobians[0][i] = result.v[i];
259 }
260 return true;
261 }
262
263 private:
264 std::unique_ptr<const Rat43CostFunctor> functor_;
265 };
266
267Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
268
269
270Pitfalls
271========
272
273Automatic differentiation frees the user from the burden of computing
274and reasoning about the symbolic expressions for the Jacobians, but
275this freedom comes at a cost. For example consider the following
276simple functor:
277
278.. code-block:: c++
279
280 struct Functor {
281 template <typename T> bool operator()(const T* x, T* residual) const {
282 residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
283 return true;
284 }
285 };
286
287Looking at the code for the residual computation, one does not foresee
288any problems. However, if we look at the analytical expressions for
289the Jacobian:
290
291.. math::
292
293 y &= 1 - \sqrt{x_0^2 + x_1^2}\\
294 D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
295 D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
296
297we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
2980`.
299
300There is no single solution to this problem. In some cases one needs
301to reason explicitly about the points where indeterminacy may occur
302and use alternate expressions using `L'Hopital's rule
303<https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
304example some of the conversion routines in `rotation.h
305<https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
306other cases, one may need to regularize the expressions to eliminate
307these points.