Blame - docs/source/numerical_derivatives.rst - RealtimeRoboticsGroup/test

blob: 8d7fb3a6a5e742e8d6433957babee1d5210a944f [file] [log] [blame]

Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame]	1	.. default-domain:: cpp
				2
				3	.. cpp:namespace:: ceres
				4
				5	.. _chapter-numerical_derivatives:
				6
				7	===================
				8	Numeric derivatives
				9	===================
				10
				11	The other extreme from using analytic derivatives is to use numeric
				12	derivatives. The key observation here is that the process of
				13	differentiating a function :math:`f(x)` w.r.t :math:`x` can be written
				14	as the limiting process:
				15
				16	.. math::
				17	Df(x) = \lim_{h \rightarrow 0} \frac{f(x + h) - f(x)}{h}
				18
				19
				20	Forward Differences
				21	===================
				22
				23	Now of course one cannot perform the limiting operation numerically on
				24	a computer so we do the next best thing, which is to choose a small
				25	value of :math:`h` and approximate the derivative as
				26
				27	.. math::
				28	Df(x) \approx \frac{f(x + h) - f(x)}{h}
				29
				30
				31	The above formula is the simplest most basic form of numeric
				32	differentiation. It is known as the Forward Difference formula.
				33
				34	So how would one go about constructing a numerically differentiated
				35	version of ``Rat43Analytic`` (`Rat43
				36	<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) in
				37	Ceres Solver. This is done in two steps:
				38
				39	1. Define Functor that given the parameter values will evaluate the
				40	residual for a given :math:`(x,y)`.
				41	2. Construct a :class:`CostFunction` by using
				42	:class:`NumericDiffCostFunction` to wrap an instance of
				43	``Rat43CostFunctor``.
				44
				45	.. code-block:: c++
				46
				47	struct Rat43CostFunctor {
				48	Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
				49
				50	bool operator()(const double* parameters, double* residuals) const {
				51	const double b1 = parameters[0];
				52	const double b2 = parameters[1];
				53	const double b3 = parameters[2];
				54	const double b4 = parameters[3];
				55	residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
				56	return true;
				57	}
				58
				59	const double x_;
				60	const double y_;
				61	}
				62
				63	CostFunction* cost_function =
Austin Schuh	3de38b0	2024-06-25 18:25:10 -0700	[diff] [blame^]	64	new NumericDiffCostFunction<Rat43CostFunctor, FORWARD, 1, 4>(x, y);
Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame]	65
				66	This is about the minimum amount of work one can expect to do to
				67	define the cost function. The only thing that the user needs to do is
				68	to make sure that the evaluation of the residual is implemented
				69	correctly and efficiently.
				70
				71	Before going further, it is instructive to get an estimate of the
				72	error in the forward difference formula. We do this by considering the
				73	`Taylor expansion <https://en.wikipedia.org/wiki/Taylor_series>`_ of
				74	:math:`f` near :math:`x`.
				75
				76	.. math::
				77	\begin{align}
				78	f(x+h) &= f(x) + h Df(x) + \frac{h^2}{2!} D^2f(x) +
				79	\frac{h^3}{3!}D^3f(x) + \cdots \\
				80	Df(x) &= \frac{f(x + h) - f(x)}{h} - \left [\frac{h}{2!}D^2f(x) +
				81	\frac{h^2}{3!}D^3f(x) + \cdots \right]\\
				82	Df(x) &= \frac{f(x + h) - f(x)}{h} + O(h)
				83	\end{align}
				84
				85	i.e., the error in the forward difference formula is
				86	:math:`O(h)` [#f4]_.
				87
				88
				89	Implementation Details
				90	----------------------
				91
				92	:class:`NumericDiffCostFunction` implements a generic algorithm to
				93	numerically differentiate a given functor. While the actual
				94	implementation of :class:`NumericDiffCostFunction` is complicated, the
				95	net result is a :class:`CostFunction` that roughly looks something
				96	like the following:
				97
				98	.. code-block:: c++
				99
				100	class Rat43NumericDiffForward : public SizedCostFunction<1,4> {
				101	public:
				102	Rat43NumericDiffForward(const Rat43Functor* functor) : functor_(functor) {}
				103	virtual ~Rat43NumericDiffForward() {}
				104	virtual bool Evaluate(double const* const* parameters,
				105	double* residuals,
				106	double** jacobians) const {
				107	functor_(parameters[0], residuals);
				108	if (!jacobians) return true;
				109	double* jacobian = jacobians[0];
				110	if (!jacobian) return true;
				111
				112	const double f = residuals[0];
				113	double parameters_plus_h[4];
				114	for (int i = 0; i < 4; ++i) {
				115	std::copy(parameters, parameters + 4, parameters_plus_h);
				116	const double kRelativeStepSize = 1e-6;
				117	const double h = std::abs(parameters[i]) * kRelativeStepSize;
				118	parameters_plus_h[i] += h;
				119	double f_plus;
				120	functor_(parameters_plus_h, &f_plus);
				121	jacobian[i] = (f_plus - f) / h;
				122	}
				123	return true;
				124	}
				125
				126	private:
				127	std::unique_ptr<Rat43Functor> functor_;
				128	};
				129
				130
				131	Note the choice of step size :math:`h` in the above code, instead of
				132	an absolute step size which is the same for all parameters, we use a
				133	relative step size of :math:`\text{kRelativeStepSize} = 10^{-6}`. This
				134	gives better derivative estimates than an absolute step size [#f2]_
				135	[#f3]_. This choice of step size only works for parameter values that
				136	are not close to zero. So the actual implementation of
				137	:class:`NumericDiffCostFunction`, uses a more complex step size
				138	selection logic, where close to zero, it switches to a fixed step
				139	size.
				140
				141
				142	Central Differences
				143	===================
				144
				145	:math:`O(h)` error in the Forward Difference formula is okay but not
				146	great. A better method is to use the Central Difference formula:
				147
				148	.. math::
				149	Df(x) \approx \frac{f(x + h) - f(x - h)}{2h}
				150
				151	Notice that if the value of :math:`f(x)` is known, the Forward
				152	Difference formula only requires one extra evaluation, but the Central
				153	Difference formula requires two evaluations, making it twice as
				154	expensive. So is the extra evaluation worth it?
				155
				156	To answer this question, we again compute the error of approximation
				157	in the central difference formula:
				158
				159	.. math::
				160	\begin{align}
				161	f(x + h) &= f(x) + h Df(x) + \frac{h^2}{2!}
				162	D^2f(x) + \frac{h^3}{3!} D^3f(x) + \frac{h^4}{4!} D^4f(x) + \cdots\\
				163	f(x - h) &= f(x) - h Df(x) + \frac{h^2}{2!}
				164	D^2f(x) - \frac{h^3}{3!} D^3f(c_2) + \frac{h^4}{4!} D^4f(x) +
				165	\cdots\\
				166	Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
				167	D^3f(x) + \frac{h^4}{5!}
				168	D^5f(x) + \cdots \\
				169	Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + O(h^2)
				170	\end{align}
				171
				172	The error of the Central Difference formula is :math:`O(h^2)`, i.e.,
				173	the error goes down quadratically whereas the error in the Forward
				174	Difference formula only goes down linearly.
				175
				176	Using central differences instead of forward differences in Ceres
				177	Solver is a simple matter of changing a template argument to
				178	:class:`NumericDiffCostFunction` as follows:
				179
				180	.. code-block:: c++
				181
				182	CostFunction* cost_function =
				183	new NumericDiffCostFunction<Rat43CostFunctor, CENTRAL, 1, 4>(
				184	new Rat43CostFunctor(x, y));
				185
				186	But what do these differences in the error mean in practice? To see
				187	this, consider the problem of evaluating the derivative of the
				188	univariate function
				189
				190	.. math::
				191	f(x) = \frac{e^x}{\sin x - x^2},
				192
				193	at :math:`x = 1.0`.
				194
				195	It is easy to determine that :math:`Df(1.0) =
				196	140.73773557129658`. Using this value as reference, we can now compute
				197	the relative error in the forward and central difference formulae as a
				198	function of the absolute step size and plot them.
				199
				200	.. figure:: forward_central_error.png
				201	:figwidth: 100%
				202	:align: center
				203
				204	Reading the graph from right to left, a number of things stand out in
				205	the above graph:
				206
				207	1. The graph for both formulae have two distinct regions. At first,
				208	starting from a large value of :math:`h` the error goes down as
				209	the effect of truncating the Taylor series dominates, but as the
				210	value of :math:`h` continues to decrease, the error starts
				211	increasing again as roundoff error starts to dominate the
				212	computation. So we cannot just keep on reducing the value of
				213	:math:`h` to get better estimates of :math:`Df`. The fact that we
				214	are using finite precision arithmetic becomes a limiting factor.
				215	2. Forward Difference formula is not a great method for evaluating
				216	derivatives. Central Difference formula converges much more
				217	quickly to a more accurate estimate of the derivative with
				218	decreasing step size. So unless the evaluation of :math:`f(x)` is
				219	so expensive that you absolutely cannot afford the extra
				220	evaluation required by central differences, **do not use the
				221	Forward Difference formula**.
				222	3. Neither formula works well for a poorly chosen value of :math:`h`.
				223
				224
				225	Ridders' Method
				226	===============
				227
				228	So, can we get better estimates of :math:`Df` without requiring such
				229	small values of :math:`h` that we start hitting floating point
				230	roundoff errors?
				231
				232	One possible approach is to find a method whose error goes down faster
				233	than :math:`O(h^2)`. This can be done by applying `Richardson
				234	Extrapolation
				235	<https://en.wikipedia.org/wiki/Richardson_extrapolation>`_ to the
				236	problem of differentiation. This is also known as Ridders' Method
				237	[Ridders]_.
				238
				239	Let us recall, the error in the central differences formula.
				240
				241	.. math::
				242	\begin{align}
				243	Df(x) & = \frac{f(x + h) - f(x - h)}{2h} + \frac{h^2}{3!}
				244	D^3f(x) + \frac{h^4}{5!}
				245	D^5f(x) + \cdots\\
				246	& = \frac{f(x + h) - f(x - h)}{2h} + K_2 h^2 + K_4 h^4 + \cdots
				247	\end{align}
				248
				249	The key thing to note here is that the terms :math:`K_2, K_4, ...`
				250	are independent of :math:`h` and only depend on :math:`x`.
				251
				252	Let us now define:
				253
				254	.. math::
				255
				256	A(1, m) = \frac{f(x + h/2^{m-1}) - f(x - h/2^{m-1})}{2h/2^{m-1}}.
				257
				258	Then observe that
				259
				260	.. math::
				261
				262	Df(x) = A(1,1) + K_2 h^2 + K_4 h^4 + \cdots
				263
				264	and
				265
				266	.. math::
				267
				268	Df(x) = A(1, 2) + K_2 (h/2)^2 + K_4 (h/2)^4 + \cdots
				269
				270	Here we have halved the step size to obtain a second central
				271	differences estimate of :math:`Df(x)`. Combining these two estimates,
				272	we get:
				273
				274	.. math::
				275
				276	Df(x) = \frac{4 A(1, 2) - A(1,1)}{4 - 1} + O(h^4)
				277
				278	which is an approximation of :math:`Df(x)` with truncation error that
				279	goes down as :math:`O(h^4)`. But we do not have to stop here. We can
				280	iterate this process to obtain even more accurate estimates as
				281	follows:
				282
				283	.. math::
				284
				285	A(n, m) = \begin{cases}
				286	\frac{\displaystyle f(x + h/2^{m-1}) - f(x -
				287	h/2^{m-1})}{\displaystyle 2h/2^{m-1}} & n = 1 \\
				288	\frac{\displaystyle 4^{n-1} A(n - 1, m + 1) - A(n - 1, m)}{\displaystyle 4^{n-1} - 1} & n > 1
				289	\end{cases}
				290
				291	It is straightforward to show that the approximation error in
				292	:math:`A(n, 1)` is :math:`O(h^{2n})`. To see how the above formula can
				293	be implemented in practice to compute :math:`A(n,1)` it is helpful to
				294	structure the computation as the following tableau:
				295
				296	.. math::
				297	\begin{array}{ccccc}
				298	A(1,1) & A(1, 2) & A(1, 3) & A(1, 4) & \cdots\\
				299	& A(2, 1) & A(2, 2) & A(2, 3) & \cdots\\
				300	& & A(3, 1) & A(3, 2) & \cdots\\
				301	& & & A(4, 1) & \cdots \\
				302	& & & & \ddots
				303	\end{array}
				304
				305	So, to compute :math:`A(n, 1)` for increasing values of :math:`n` we
				306	move from the left to the right, computing one column at a
				307	time. Assuming that the primary cost here is the evaluation of the
				308	function :math:`f(x)`, the cost of computing a new column of the above
				309	tableau is two function evaluations. Since the cost of evaluating
				310	:math:`A(1, n)`, requires evaluating the central difference formula
				311	for step size of :math:`2^{1-n}h`
				312
				313	Applying this method to :math:`f(x) = \frac{e^x}{\sin x - x^2}`
				314	starting with a fairly large step size :math:`h = 0.01`, we get:
				315
				316	.. math::
				317	\begin{array}{rrrrr}
				318	141.678097131 &140.971663667 &140.796145400 &140.752333523 &140.741384778\\
				319	&140.736185846 &140.737639311 &140.737729564 &140.737735196\\
				320	& &140.737736209 &140.737735581 &140.737735571\\
				321	& & &140.737735571 &140.737735571\\
				322	& & & &140.737735571\\
				323	\end{array}
				324
				325	Compared to the correct value :math:`Df(1.0) = 140.73773557129658`,
				326	:math:`A(5, 1)` has a relative error of :math:`10^{-13}`. For
				327	comparison, the relative error for the central difference formula with
Austin Schuh	3de38b0	2024-06-25 18:25:10 -0700	[diff] [blame^]	328	the same step size (:math:`0.01/2^4 = 0.000625`) is :math:`10^{-5}`.
Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame]	329
				330	The above tableau is the basis of Ridders' method for numeric
				331	differentiation. The full implementation is an adaptive scheme that
				332	tracks its own estimation error and stops automatically when the
				333	desired precision is reached. Of course it is more expensive than the
				334	forward and central difference formulae, but is also significantly
				335	more robust and accurate.
				336
				337	Using Ridder's method instead of forward or central differences in
				338	Ceres is again a simple matter of changing a template argument to
				339	:class:`NumericDiffCostFunction` as follows:
				340
				341	.. code-block:: c++
				342
				343	CostFunction* cost_function =
				344	new NumericDiffCostFunction<Rat43CostFunctor, RIDDERS, 1, 4>(
				345	new Rat43CostFunctor(x, y));
				346
				347	The following graph shows the relative error of the three methods as a
				348	function of the absolute step size. For Ridders's method we assume
				349	that the step size for evaluating :math:`A(n,1)` is :math:`2^{1-n}h`.
				350
				351	.. figure:: forward_central_ridders_error.png
				352	:figwidth: 100%
				353	:align: center
				354
				355	Using the 10 function evaluations that are needed to compute
				356	:math:`A(5,1)` we are able to approximate :math:`Df(1.0)` about a 1000
				357	times better than the best central differences estimate. To put these
				358	numbers in perspective, machine epsilon for double precision
				359	arithmetic is :math:`\approx 2.22 \times 10^{-16}`.
				360
				361	Going back to ``Rat43``, let us also look at the runtime cost of the
				362	various methods for computing numeric derivatives.
				363
				364	========================== =========
				365	CostFunction Time (ns)
				366	========================== =========
				367	Rat43Analytic 255
				368	Rat43AnalyticOptimized 92
				369	Rat43NumericDiffForward 262
				370	Rat43NumericDiffCentral 517
				371	Rat43NumericDiffRidders 3760
				372	========================== =========
				373
				374	As expected, Central Differences is about twice as expensive as
				375	Forward Differences and the remarkable accuracy improvements of
				376	Ridders' method cost an order of magnitude more runtime.
				377
				378	Recommendations
				379	===============
				380
				381	Numeric differentiation should be used when you cannot compute the
				382	derivatives either analytically or using automatic differentiation. This
				383	is usually the case when you are calling an external library or
				384	function whose analytic form you do not know or even if you do, you
				385	are not in a position to re-write it in a manner required to use
				386	:ref:`chapter-automatic_derivatives`.
				387
				388
				389	When using numeric differentiation, use at least Central Differences,
				390	and if execution time is not a concern or the objective function is
				391	such that determining a good static relative step size is hard,
				392	Ridders' method is recommended.
				393
				394	.. rubric:: Footnotes
				395
				396	.. [#f2] `Numerical Differentiation
				397	<https://en.wikipedia.org/wiki/Numerical_differentiation#Practical_considerations_using_floating_point_arithmetic>`_
				398	.. [#f3] [Press]_ Numerical Recipes, Section 5.7
				399	.. [#f4] In asymptotic error analysis, an error of :math:`O(h^k)`
				400	means that the absolute-value of the error is at most some
				401	constant times :math:`h^k` when :math:`h` is close enough to
				402	:math:`0`.