blob: 8d7fb3a6a5e742e8d6433957babee1d5210a944f [file] [log] [blame]
Austin Schuh70cc9552019-01-21 19:46:48 -08001.. default-domain:: cpp
2
3.. cpp:namespace:: ceres
4
5.. _chapter-numerical_derivatives:
6
7===================
8Numeric derivatives
9===================
10
11The other extreme from using analytic derivatives is to use numeric
12derivatives. The key observation here is that the process of
13differentiating a function :math:`f(x)` w.r.t :math:`x` can be written
14as the limiting process:
15
16.. math::
17 Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}
18
19
20Forward Differences
21===================
22
23Now of course one cannot perform the limiting operation numerically on
24a computer so we do the next best thing, which is to choose a small
25value of :math:`h` and approximate the derivative as
26
27.. math::
28 Df(x) \approx \frac{f(x + h) - f(x)}{h}
29
30
31The above formula is the simplest most basic form of numeric
32differentiation. It is known as the *Forward Difference* formula.
33
34So how would one go about constructing a numerically differentiated
35version of ``Rat43Analytic`` (`Rat43
36<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) in
37Ceres Solver. This is done in two steps:
38
39 1. Define *Functor* that given the parameter values will evaluate the
40 residual for a given :math:`(x,y)`.
41 2. Construct a :class:`CostFunction` by using
42 :class:`NumericDiffCostFunction` to wrap an instance of
43 ``Rat43CostFunctor``.
44
45.. code-block:: c++
46
47 struct Rat43CostFunctor {
48 Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
49
50 bool operator()(const double* parameters, double* residuals) const {
51 const double b1 = parameters[0];
52 const double b2 = parameters[1];
53 const double b3 = parameters[2];
54 const double b4 = parameters[3];
55 residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
56 return true;
57 }
58
59 const double x_;
60 const double y_;
61 }
62
63 CostFunction* cost_function =
Austin Schuh3de38b02024-06-25 18:25:10 -070064 new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(x, y);
Austin Schuh70cc9552019-01-21 19:46:48 -080065
66This is about the minimum amount of work one can expect to do to
67define the cost function. The only thing that the user needs to do is
68to make sure that the evaluation of the residual is implemented
69correctly and efficiently.
70
71Before going further, it is instructive to get an estimate of the
72error in the forward difference formula. We do this by considering the
73`Taylor expansion <https://en.wikipedia.org/wiki/Taylor_series>`_ of
74:math:`f` near :math:`x`.
75
76.. math::
77 \begin{align}
78 f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) +
79 \frac{h^3}{3!}D^3f(x) + \cdots \\
80 Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [\frac{h}{2!}D^2f(x) +
81 \frac{h^2}{3!}D^3f(x) + \cdots \right]\\
82 Df(x) &= \frac{f(x + h) - f(x)}{h} + O(h)
83 \end{align}
84
85i.e., the error in the forward difference formula is
86:math:`O(h)` [#f4]_.
87
88
89Implementation Details
90----------------------
91
92:class:`NumericDiffCostFunction` implements a generic algorithm to
93numerically differentiate a given functor. While the actual
94implementation of :class:`NumericDiffCostFunction` is complicated, the
95net result is a :class:`CostFunction` that roughly looks something
96like the following:
97
98.. code-block:: c++
99
100 class Rat43NumericDiffForward : public SizedCostFunction<1,4> {
101 public:
102 Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {}
103 virtual ~Rat43NumericDiffForward() {}
104 virtual bool Evaluate(double const* const* parameters,
105 double* residuals,
106 double** jacobians) const {
107 functor_(parameters[0], residuals);
108 if (!jacobians) return true;
109 double* jacobian = jacobians[0];
110 if (!jacobian) return true;
111
112 const double f = residuals[0];
113 double parameters_plus_h[4];
114 for (int i = 0; i < 4; ++i) {
115 std::copy(parameters, parameters + 4, parameters_plus_h);
116 const double kRelativeStepSize = 1e-6;
117 const double h = std::abs(parameters[i]) * kRelativeStepSize;
118 parameters_plus_h[i] += h;
119 double f_plus;
120 functor_(parameters_plus_h, &f_plus);
121 jacobian[i] = (f_plus - f) / h;
122 }
123 return true;
124 }
125
126 private:
127 std::unique_ptr<Rat43Functor> functor_;
128 };
129
130
131Note the choice of step size :math:`h` in the above code, instead of
132an absolute step size which is the same for all parameters, we use a
133relative step size of :math:`\text{kRelativeStepSize} = 10^{-6}`. This
134gives better derivative estimates than an absolute step size [#f2]_
135[#f3]_. This choice of step size only works for parameter values that
136are not close to zero. So the actual implementation of
137:class:`NumericDiffCostFunction`, uses a more complex step size
138selection logic, where close to zero, it switches to a fixed step
139size.
140
141
142Central Differences
143===================
144
145:math:`O(h)` error in the Forward Difference formula is okay but not
146great. A better method is to use the *Central Difference* formula:
147
148.. math::
149 Df(x) \approx \frac{f(x + h) - f(x - h)}{2h}
150
151Notice that if the value of :math:`f(x)` is known, the Forward
152Difference formula only requires one extra evaluation, but the Central
153Difference formula requires two evaluations, making it twice as
154expensive. So is the extra evaluation worth it?
155
156To answer this question, we again compute the error of approximation
157in the central difference formula:
158
159.. math::
160 \begin{align}
161 f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!}
162 D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\
163 f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!}
164 D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) +
165 \cdots\\
166 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
167 D^3f(x) + \frac{h^4}{5!}
168 D^5f(x) + \cdots \\
169 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2)
170 \end{align}
171
172The error of the Central Difference formula is :math:`O(h^2)`, i.e.,
173the error goes down quadratically whereas the error in the Forward
174Difference formula only goes down linearly.
175
176Using central differences instead of forward differences in Ceres
177Solver is a simple matter of changing a template argument to
178:class:`NumericDiffCostFunction` as follows:
179
180.. code-block:: c++
181
182 CostFunction* cost_function =
183 new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>(
184 new Rat43CostFunctor(x, y));
185
186But what do these differences in the error mean in practice? To see
187this, consider the problem of evaluating the derivative of the
188univariate function
189
190.. math::
191 f(x) = \frac{e^x}{\sin x - x^2},
192
193at :math:`x = 1.0`.
194
195It is easy to determine that :math:`Df(1.0) =
196140.73773557129658`. Using this value as reference, we can now compute
197the relative error in the forward and central difference formulae as a
198function of the absolute step size and plot them.
199
200.. figure:: forward_central_error.png
201 :figwidth: 100%
202 :align: center
203
204Reading the graph from right to left, a number of things stand out in
205the above graph:
206
207 1. The graph for both formulae have two distinct regions. At first,
208 starting from a large value of :math:`h` the error goes down as
209 the effect of truncating the Taylor series dominates, but as the
210 value of :math:`h` continues to decrease, the error starts
211 increasing again as roundoff error starts to dominate the
212 computation. So we cannot just keep on reducing the value of
213 :math:`h` to get better estimates of :math:`Df`. The fact that we
214 are using finite precision arithmetic becomes a limiting factor.
215 2. Forward Difference formula is not a great method for evaluating
216 derivatives. Central Difference formula converges much more
217 quickly to a more accurate estimate of the derivative with
218 decreasing step size. So unless the evaluation of :math:`f(x)` is
219 so expensive that you absolutely cannot afford the extra
220 evaluation required by central differences, **do not use the
221 Forward Difference formula**.
222 3. Neither formula works well for a poorly chosen value of :math:`h`.
223
224
225Ridders' Method
226===============
227
228So, can we get better estimates of :math:`Df` without requiring such
229small values of :math:`h` that we start hitting floating point
230roundoff errors?
231
232One possible approach is to find a method whose error goes down faster
233than :math:`O(h^2)`. This can be done by applying `Richardson
234Extrapolation
235<https://en.wikipedia.org/wiki/Richardson_extrapolation>`_ to the
236problem of differentiation. This is also known as *Ridders' Method*
237[Ridders]_.
238
239Let us recall, the error in the central differences formula.
240
241.. math::
242 \begin{align}
243 Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
244 D^3f(x) + \frac{h^4}{5!}
245 D^5f(x) + \cdots\\
246 & = \frac{f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots
247 \end{align}
248
249The key thing to note here is that the terms :math:`K_2, K_4, ...`
250are independent of :math:`h` and only depend on :math:`x`.
251
252Let us now define:
253
254.. math::
255
256 A(1, m) = \frac{f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}.
257
258Then observe that
259
260.. math::
261
262 Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots
263
264and
265
266.. math::
267
268 Df(x) = A(1, 2) + K_2 (h/2)^2 + K_4 (h/2)^4 + \cdots
269
270Here we have halved the step size to obtain a second central
271differences estimate of :math:`Df(x)`. Combining these two estimates,
272we get:
273
274.. math::
275
276 Df(x) = \frac{4 A(1, 2) - A(1,1)}{4 - 1} + O(h^4)
277
278which is an approximation of :math:`Df(x)` with truncation error that
279goes down as :math:`O(h^4)`. But we do not have to stop here. We can
280iterate this process to obtain even more accurate estimates as
281follows:
282
283.. math::
284
285 A(n, m) = \begin{cases}
286 \frac{\displaystyle f(x + h/2^{m-1}) - f(x -
287 h/2^{m-1})}{\displaystyle 2h/2^{m-1}} & n = 1 \\
288 \frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n-1} - 1} & n > 1
289 \end{cases}
290
291It is straightforward to show that the approximation error in
292:math:`A(n, 1)` is :math:`O(h^{2n})`. To see how the above formula can
293be implemented in practice to compute :math:`A(n,1)` it is helpful to
294structure the computation as the following tableau:
295
296.. math::
297 \begin{array}{ccccc}
298 A(1,1) & A(1, 2) & A(1, 3) & A(1, 4) & \cdots\\
299 & A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\
300 & & A(3, 1) & A(3, 2) & \cdots\\
301 & & & A(4, 1) & \cdots \\
302 & & & & \ddots
303 \end{array}
304
305So, to compute :math:`A(n, 1)` for increasing values of :math:`n` we
306move from the left to the right, computing one column at a
307time. Assuming that the primary cost here is the evaluation of the
308function :math:`f(x)`, the cost of computing a new column of the above
309tableau is two function evaluations. Since the cost of evaluating
310:math:`A(1, n)`, requires evaluating the central difference formula
311for step size of :math:`2^{1-n}h`
312
313Applying this method to :math:`f(x) = \frac{e^x}{\sin x - x^2}`
314starting with a fairly large step size :math:`h = 0.01`, we get:
315
316.. math::
317 \begin{array}{rrrrr}
318 141.678097131 &140.971663667 &140.796145400 &140.752333523 &140.741384778\\
319 &140.736185846 &140.737639311 &140.737729564 &140.737735196\\
320 & &140.737736209 &140.737735581 &140.737735571\\
321 & & &140.737735571 &140.737735571\\
322 & & & &140.737735571\\
323 \end{array}
324
325Compared to the *correct* value :math:`Df(1.0) = 140.73773557129658`,
326:math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
327comparison, the relative error for the central difference formula with
Austin Schuh3de38b02024-06-25 18:25:10 -0700328the same step size (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
Austin Schuh70cc9552019-01-21 19:46:48 -0800329
330The above tableau is the basis of Ridders' method for numeric
331differentiation. The full implementation is an adaptive scheme that
332tracks its own estimation error and stops automatically when the
333desired precision is reached. Of course it is more expensive than the
334forward and central difference formulae, but is also significantly
335more robust and accurate.
336
337Using Ridder's method instead of forward or central differences in
338Ceres is again a simple matter of changing a template argument to
339:class:`NumericDiffCostFunction` as follows:
340
341.. code-block:: c++
342
343 CostFunction* cost_function =
344 new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>(
345 new Rat43CostFunctor(x, y));
346
347The following graph shows the relative error of the three methods as a
348function of the absolute step size. For Ridders's method we assume
349that the step size for evaluating :math:`A(n,1)` is :math:`2^{1-n}h`.
350
351.. figure:: forward_central_ridders_error.png
352 :figwidth: 100%
353 :align: center
354
355Using the 10 function evaluations that are needed to compute
356:math:`A(5,1)` we are able to approximate :math:`Df(1.0)` about a 1000
357times better than the best central differences estimate. To put these
358numbers in perspective, machine epsilon for double precision
359arithmetic is :math:`\approx 2.22 \times 10^{-16}`.
360
361Going back to ``Rat43``, let us also look at the runtime cost of the
362various methods for computing numeric derivatives.
363
364========================== =========
365CostFunction Time (ns)
366========================== =========
367Rat43Analytic 255
368Rat43AnalyticOptimized 92
369Rat43NumericDiffForward 262
370Rat43NumericDiffCentral 517
371Rat43NumericDiffRidders 3760
372========================== =========
373
374As expected, Central Differences is about twice as expensive as
375Forward Differences and the remarkable accuracy improvements of
376Ridders' method cost an order of magnitude more runtime.
377
378Recommendations
379===============
380
381Numeric differentiation should be used when you cannot compute the
382derivatives either analytically or using automatic differentiation. This
383is usually the case when you are calling an external library or
384function whose analytic form you do not know or even if you do, you
385are not in a position to re-write it in a manner required to use
386:ref:`chapter-automatic_derivatives`.
387
388
389When using numeric differentiation, use at least Central Differences,
390and if execution time is not a concern or the objective function is
391such that determining a good static relative step size is hard,
392Ridders' method is recommended.
393
394.. rubric:: Footnotes
395
396.. [#f2] `Numerical Differentiation
397 <https://en.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic>`_
398.. [#f3] [Press]_ Numerical Recipes, Section 5.7
399.. [#f4] In asymptotic error analysis, an error of :math:`O(h^k)`
400 means that the absolute-value of the error is at most some
401 constant times :math:`h^k` when :math:`h` is close enough to
402 :math:`0`.