blob: 57b46bf360bd156c4d51f06dab387e7ee591b3f8 [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001.. default-domain:: cpp
2
3.. cpp:namespace:: ceres
4
5.. _chapter-numerical_derivatives:
6
7===================
8Numeric derivatives
9===================
10
11The other extreme from using analytic derivatives is to use numeric
12derivatives. The key observation here is that the process of
13differentiating a function :math:`f(x)` w.r.t :math:`x` can be written
14as the limiting process:
15
16.. math::
17 Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}
18
19
20Forward Differences
21===================
22
23Now of course one cannot perform the limiting operation numerically on
24a computer so we do the next best thing, which is to choose a small
25value of :math:`h` and approximate the derivative as
26
27.. math::
28 Df(x) \approx \frac{f(x + h) - f(x)}{h}
29
30
31The above formula is the simplest most basic form of numeric
32differentiation. It is known as the *Forward Difference* formula.
33
34So how would one go about constructing a numerically differentiated
35version of ``Rat43Analytic`` (`Rat43
36<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) in
37Ceres Solver. This is done in two steps:
38
39 1. Define *Functor* that given the parameter values will evaluate the
40 residual for a given :math:`(x,y)`.
41 2. Construct a :class:`CostFunction` by using
42 :class:`NumericDiffCostFunction` to wrap an instance of
43 ``Rat43CostFunctor``.
44
45.. code-block:: c++
46
47 struct Rat43CostFunctor {
48 Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
49
50 bool operator()(const double* parameters, double* residuals) const {
51 const double b1 = parameters[0];
52 const double b2 = parameters[1];
53 const double b3 = parameters[2];
54 const double b4 = parameters[3];
55 residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
56 return true;
57 }
58
59 const double x_;
60 const double y_;
61 }
62
63 CostFunction* cost_function =
64 new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(
65 new Rat43CostFunctor(x, y));
66
67This is about the minimum amount of work one can expect to do to
68define the cost function. The only thing that the user needs to do is
69to make sure that the evaluation of the residual is implemented
70correctly and efficiently.
71
72Before going further, it is instructive to get an estimate of the
73error in the forward difference formula. We do this by considering the
74`Taylor expansion <https://en.wikipedia.org/wiki/Taylor_series>`_ of
75:math:`f` near :math:`x`.
76
77.. math::
78 \begin{align}
79 f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) +
80 \frac{h^3}{3!}D^3f(x) + \cdots \\
81 Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [\frac{h}{2!}D^2f(x) +
82 \frac{h^2}{3!}D^3f(x) + \cdots \right]\\
83 Df(x) &= \frac{f(x + h) - f(x)}{h} + O(h)
84 \end{align}
85
86i.e., the error in the forward difference formula is
87:math:`O(h)` [#f4]_.
88
89
90Implementation Details
91----------------------
92
93:class:`NumericDiffCostFunction` implements a generic algorithm to
94numerically differentiate a given functor. While the actual
95implementation of :class:`NumericDiffCostFunction` is complicated, the
96net result is a :class:`CostFunction` that roughly looks something
97like the following:
98
99.. code-block:: c++
100
101 class Rat43NumericDiffForward : public SizedCostFunction<1,4> {
102 public:
103 Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {}
104 virtual ~Rat43NumericDiffForward() {}
105 virtual bool Evaluate(double const* const* parameters,
106 double* residuals,
107 double** jacobians) const {
108 functor_(parameters[0], residuals);
109 if (!jacobians) return true;
110 double* jacobian = jacobians[0];
111 if (!jacobian) return true;
112
113 const double f = residuals[0];
114 double parameters_plus_h[4];
115 for (int i = 0; i < 4; ++i) {
116 std::copy(parameters, parameters + 4, parameters_plus_h);
117 const double kRelativeStepSize = 1e-6;
118 const double h = std::abs(parameters[i]) * kRelativeStepSize;
119 parameters_plus_h[i] += h;
120 double f_plus;
121 functor_(parameters_plus_h, &f_plus);
122 jacobian[i] = (f_plus - f) / h;
123 }
124 return true;
125 }
126
127 private:
128 std::unique_ptr<Rat43Functor> functor_;
129 };
130
131
132Note the choice of step size :math:`h` in the above code, instead of
133an absolute step size which is the same for all parameters, we use a
134relative step size of :math:`\text{kRelativeStepSize} = 10^{-6}`. This
135gives better derivative estimates than an absolute step size [#f2]_
136[#f3]_. This choice of step size only works for parameter values that
137are not close to zero. So the actual implementation of
138:class:`NumericDiffCostFunction`, uses a more complex step size
139selection logic, where close to zero, it switches to a fixed step
140size.
141
142
143Central Differences
144===================
145
146:math:`O(h)` error in the Forward Difference formula is okay but not
147great. A better method is to use the *Central Difference* formula:
148
149.. math::
150 Df(x) \approx \frac{f(x + h) - f(x - h)}{2h}
151
152Notice that if the value of :math:`f(x)` is known, the Forward
153Difference formula only requires one extra evaluation, but the Central
154Difference formula requires two evaluations, making it twice as
155expensive. So is the extra evaluation worth it?
156
157To answer this question, we again compute the error of approximation
158in the central difference formula:
159
160.. math::
161 \begin{align}
162 f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!}
163 D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\
164 f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!}
165 D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) +
166 \cdots\\
167 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
168 D^3f(x) + \frac{h^4}{5!}
169 D^5f(x) + \cdots \\
170 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2)
171 \end{align}
172
173The error of the Central Difference formula is :math:`O(h^2)`, i.e.,
174the error goes down quadratically whereas the error in the Forward
175Difference formula only goes down linearly.
176
177Using central differences instead of forward differences in Ceres
178Solver is a simple matter of changing a template argument to
179:class:`NumericDiffCostFunction` as follows:
180
181.. code-block:: c++
182
183 CostFunction* cost_function =
184 new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>(
185 new Rat43CostFunctor(x, y));
186
187But what do these differences in the error mean in practice? To see
188this, consider the problem of evaluating the derivative of the
189univariate function
190
191.. math::
192 f(x) = \frac{e^x}{\sin x - x^2},
193
194at :math:`x = 1.0`.
195
196It is easy to determine that :math:`Df(1.0) =
197140.73773557129658`. Using this value as reference, we can now compute
198the relative error in the forward and central difference formulae as a
199function of the absolute step size and plot them.
200
201.. figure:: forward_central_error.png
202 :figwidth: 100%
203 :align: center
204
205Reading the graph from right to left, a number of things stand out in
206the above graph:
207
208 1. The graph for both formulae have two distinct regions. At first,
209 starting from a large value of :math:`h` the error goes down as
210 the effect of truncating the Taylor series dominates, but as the
211 value of :math:`h` continues to decrease, the error starts
212 increasing again as roundoff error starts to dominate the
213 computation. So we cannot just keep on reducing the value of
214 :math:`h` to get better estimates of :math:`Df`. The fact that we
215 are using finite precision arithmetic becomes a limiting factor.
216 2. Forward Difference formula is not a great method for evaluating
217 derivatives. Central Difference formula converges much more
218 quickly to a more accurate estimate of the derivative with
219 decreasing step size. So unless the evaluation of :math:`f(x)` is
220 so expensive that you absolutely cannot afford the extra
221 evaluation required by central differences, **do not use the
222 Forward Difference formula**.
223 3. Neither formula works well for a poorly chosen value of :math:`h`.
224
225
226Ridders' Method
227===============
228
229So, can we get better estimates of :math:`Df` without requiring such
230small values of :math:`h` that we start hitting floating point
231roundoff errors?
232
233One possible approach is to find a method whose error goes down faster
234than :math:`O(h^2)`. This can be done by applying `Richardson
235Extrapolation
236<https://en.wikipedia.org/wiki/Richardson_extrapolation>`_ to the
237problem of differentiation. This is also known as *Ridders' Method*
238[Ridders]_.
239
240Let us recall, the error in the central differences formula.
241
242.. math::
243 \begin{align}
244 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
245 D^3f(x) + \frac{h^4}{5!}
246 D^5f(x) + \cdots\\
247 & = \frac{f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots
248 \end{align}
249
250The key thing to note here is that the terms :math:`K_2, K_4, ...`
251are independent of :math:`h` and only depend on :math:`x`.
252
253Let us now define:
254
255.. math::
256
257 A(1, m) = \frac{f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}.
258
259Then observe that
260
261.. math::
262
263 Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots
264
265and
266
267.. math::
268
269 Df(x) = A(1, 2) + K_2 (h/2)^2 + K_4 (h/2)^4 + \cdots
270
271Here we have halved the step size to obtain a second central
272differences estimate of :math:`Df(x)`. Combining these two estimates,
273we get:
274
275.. math::
276
277 Df(x) = \frac{4 A(1, 2) - A(1,1)}{4 - 1} + O(h^4)
278
279which is an approximation of :math:`Df(x)` with truncation error that
280goes down as :math:`O(h^4)`. But we do not have to stop here. We can
281iterate this process to obtain even more accurate estimates as
282follows:
283
284.. math::
285
286 A(n, m) = \begin{cases}
287 \frac{\displaystyle f(x + h/2^{m-1}) - f(x -
288 h/2^{m-1})}{\displaystyle 2h/2^{m-1}} & n = 1 \\
289 \frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n-1} - 1} & n > 1
290 \end{cases}
291
292It is straightforward to show that the approximation error in
293:math:`A(n, 1)` is :math:`O(h^{2n})`. To see how the above formula can
294be implemented in practice to compute :math:`A(n,1)` it is helpful to
295structure the computation as the following tableau:
296
297.. math::
298 \begin{array}{ccccc}
299 A(1,1) & A(1, 2) & A(1, 3) & A(1, 4) & \cdots\\
300 & A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\
301 & & A(3, 1) & A(3, 2) & \cdots\\
302 & & & A(4, 1) & \cdots \\
303 & & & & \ddots
304 \end{array}
305
306So, to compute :math:`A(n, 1)` for increasing values of :math:`n` we
307move from the left to the right, computing one column at a
308time. Assuming that the primary cost here is the evaluation of the
309function :math:`f(x)`, the cost of computing a new column of the above
310tableau is two function evaluations. Since the cost of evaluating
311:math:`A(1, n)`, requires evaluating the central difference formula
312for step size of :math:`2^{1-n}h`
313
314Applying this method to :math:`f(x) = \frac{e^x}{\sin x - x^2}`
315starting with a fairly large step size :math:`h = 0.01`, we get:
316
317.. math::
318 \begin{array}{rrrrr}
319 141.678097131 &140.971663667 &140.796145400 &140.752333523 &140.741384778\\
320 &140.736185846 &140.737639311 &140.737729564 &140.737735196\\
321 & &140.737736209 &140.737735581 &140.737735571\\
322 & & &140.737735571 &140.737735571\\
323 & & & &140.737735571\\
324 \end{array}
325
326Compared to the *correct* value :math:`Df(1.0) = 140.73773557129658`,
327:math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
328comparison, the relative error for the central difference formula with
329the same stepsize (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
330
331The above tableau is the basis of Ridders' method for numeric
332differentiation. The full implementation is an adaptive scheme that
333tracks its own estimation error and stops automatically when the
334desired precision is reached. Of course it is more expensive than the
335forward and central difference formulae, but is also significantly
336more robust and accurate.
337
338Using Ridder's method instead of forward or central differences in
339Ceres is again a simple matter of changing a template argument to
340:class:`NumericDiffCostFunction` as follows:
341
342.. code-block:: c++
343
344 CostFunction* cost_function =
345 new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>(
346 new Rat43CostFunctor(x, y));
347
348The following graph shows the relative error of the three methods as a
349function of the absolute step size. For Ridders's method we assume
350that the step size for evaluating :math:`A(n,1)` is :math:`2^{1-n}h`.
351
352.. figure:: forward_central_ridders_error.png
353 :figwidth: 100%
354 :align: center
355
356Using the 10 function evaluations that are needed to compute
357:math:`A(5,1)` we are able to approximate :math:`Df(1.0)` about a 1000
358times better than the best central differences estimate. To put these
359numbers in perspective, machine epsilon for double precision
360arithmetic is :math:`\approx 2.22 \times 10^{-16}`.
361
362Going back to ``Rat43``, let us also look at the runtime cost of the
363various methods for computing numeric derivatives.
364
365========================== =========
366CostFunction Time (ns)
367========================== =========
368Rat43Analytic 255
369Rat43AnalyticOptimized 92
370Rat43NumericDiffForward 262
371Rat43NumericDiffCentral 517
372Rat43NumericDiffRidders 3760
373========================== =========
374
375As expected, Central Differences is about twice as expensive as
376Forward Differences and the remarkable accuracy improvements of
377Ridders' method cost an order of magnitude more runtime.
378
379Recommendations
380===============
381
382Numeric differentiation should be used when you cannot compute the
383derivatives either analytically or using automatic differentiation. This
384is usually the case when you are calling an external library or
385function whose analytic form you do not know or even if you do, you
386are not in a position to re-write it in a manner required to use
387:ref:`chapter-automatic_derivatives`.
388
389
390When using numeric differentiation, use at least Central Differences,
391and if execution time is not a concern or the objective function is
392such that determining a good static relative step size is hard,
393Ridders' method is recommended.
394
395.. rubric:: Footnotes
396
397.. [#f2] `Numerical Differentiation
398 <https://en.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic>`_
399.. [#f3] [Press]_ Numerical Recipes, Section 5.7
400.. [#f4] In asymptotic error analysis, an error of :math:`O(h^k)`
401 means that the absolute-value of the error is at most some
402 constant times :math:`h^k` when :math:`h` is close enough to
403 :math:`0`.