Blame - docs/source/automatic_derivatives.rst - RealtimeRoboticsGroup/test

blob: 0c48c808150958602dcc1ab747d9878ff0619179 [file] [log] [blame]

Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame^]	1	.. default-domain:: cpp
				2
				3	.. cpp:namespace:: ceres
				4
				5	.. _chapter-automatic_derivatives:
				6
				7	=====================
				8	Automatic Derivatives
				9	=====================
				10
				11	We will now consider automatic differentiation. It is a technique that
				12	can compute exact derivatives, fast, while requiring about the same
				13	effort from the user as is needed to use numerical differentiation.
				14
				15	Don't believe me? Well here goes. The following code fragment
				16	implements an automatically differentiated ``CostFunction`` for `Rat43
				17	<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_.
				18
				19	.. code-block:: c++
				20
				21	struct Rat43CostFunctor {
				22	Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {}
				23
				24	template <typename T>
				25	bool operator()(const T* parameters, T* residuals) const {
				26	const T b1 = parameters[0];
				27	const T b2 = parameters[1];
				28	const T b3 = parameters[2];
				29	const T b4 = parameters[3];
				30	residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
				31	return true;
				32	}
				33
				34	private:
				35	const double x_;
				36	const double y_;
				37	};
				38
				39
				40	CostFunction* cost_function =
				41	new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>(
				42	new Rat43CostFunctor(x, y));
				43
				44	Notice that compared to numeric differentiation, the only difference
				45	when defining the functor for use with automatic differentiation is
				46	the signature of the ``operator()``.
				47
				48	In the case of numeric differentiation it was
				49
				50	.. code-block:: c++
				51
				52	bool operator()(const double* parameters, double* residuals) const;
				53
				54	and for automatic differentiation it is a templated function of the
				55	form
				56
				57	.. code-block:: c++
				58
				59	template <typename T> bool operator()(const T* parameters, T* residuals) const;
				60
				61
				62	So what does this small change buy us? The following table compares
				63	the time it takes to evaluate the residual and the Jacobian for
				64	`Rat43` using various methods.
				65
				66	========================== =========
				67	CostFunction Time (ns)
				68	========================== =========
				69	Rat43Analytic 255
				70	Rat43AnalyticOptimized 92
				71	Rat43NumericDiffForward 262
				72	Rat43NumericDiffCentral 517
				73	Rat43NumericDiffRidders 3760
				74	Rat43AutomaticDiff 129
				75	========================== =========
				76
				77	We can get exact derivatives using automatic differentiation
				78	(``Rat43AutomaticDiff``) with about the same effort that is required
				79	to write the code for numeric differentiation but only :math:`40\%`
				80	slower than hand optimized analytical derivatives.
				81
				82	So how does it work? For this we will have to learn about **Dual
				83	Numbers and Jets** .
				84
				85
				86	Dual Numbers & Jets
				87	===================
				88
				89	.. NOTE::
				90
				91	Reading this and the next section on implementing Jets is not
				92	necessary to use automatic differentiation in Ceres Solver. But
				93	knowing the basics of how Jets work is useful when debugging and
				94	reasoning about the performance of automatic differentiation.
				95
				96	Dual numbers are an extension of the real numbers analogous to complex
				97	numbers: whereas complex numbers augment the reals by introducing an
				98	imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual
				99	numbers introduce an infinitesimal unit :math:`\epsilon` such that
				100	:math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two
				101	components, the real component :math:`a` and the infinitesimal
				102	component :math:`v`.
				103
				104	Surprisingly, this simple change leads to a convenient method for
				105	computing exact derivatives without needing to manipulate complicated
				106	symbolic expressions.
				107
				108	For example, consider the function
				109
				110	.. math::
				111
				112	f(x) = x^2 ,
				113
				114	Then,
				115
				116	.. math::
				117
				118	\begin{align}
				119	f(10 + \epsilon) &= (10 + \epsilon)^2\\
				120	&= 100 + 20 \epsilon + \epsilon^2\\
				121	&= 100 + 20 \epsilon
				122	\end{align}
				123
				124	Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) =
				125	20`. Indeed this generalizes to functions which are not
				126	polynomial. Consider an arbitrary differentiable function
				127	:math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by
				128	considering the Taylor expansion of :math:`f` near :math:`x`, which
				129	gives us the infinite series
				130
				131	.. math::
				132	\begin{align}
				133	f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x)
				134	\frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\
				135	f(x + \epsilon) &= f(x) + Df(x) \epsilon
				136	\end{align}
				137
				138	Here we are using the fact that :math:`\epsilon^2 = 0`.
				139
				140	A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a
				141	:math:`n`-dimensional dual number, where we augment the real numbers
				142	with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with
				143	the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then
				144	a Jet consists of a real part :math:`a` and a :math:`n`-dimensional
				145	infinitesimal part :math:`\mathbf{v}`, i.e.,
				146
				147	.. math::
				148	x = a + \sum_j v_{j} \epsilon_j
				149
				150	The summation notation gets tedious, so we will also just write
				151
				152	.. math::
				153	x = a + \mathbf{v}.
				154
				155	where the :math:`\epsilon_i`'s are implicit. Then, using the same
				156	Taylor series expansion used above, we can see that:
				157
				158	.. math::
				159
				160	f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}.
				161
				162	Similarly for a multivariate function
				163	:math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on
				164	:math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`:
				165
				166	.. math::
				167	f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i
				168
				169	So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}`
				170	standard basis vector, then, the above expression would simplify to
				171
				172	.. math::
				173	f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i
				174
				175	and we can extract the coordinates of the Jacobian by inspecting the
				176	coefficients of :math:`\epsilon_i`.
				177
				178	Implementing Jets
				179	-----------------
				180
				181	In order for the above to work in practice, we will need the ability
				182	to evaluate an arbitrary function :math:`f` not just on real numbers
				183	but also on dual numbers, but one does not usually evaluate functions
				184	by evaluating their Taylor expansions,
				185
				186	This is where C++ templates and operator overloading comes into
				187	play. The following code fragment has a simple implementation of a
				188	``Jet`` and some operators/functions that operate on them.
				189
				190	.. code-block:: c++
				191
				192	template<int N> struct Jet {
				193	double a;
				194	Eigen::Matrix<double, 1, N> v;
				195	};
				196
				197	template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) {
				198	return Jet<N>(f.a + g.a, f.v + g.v);
				199	}
				200
				201	template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) {
				202	return Jet<N>(f.a - g.a, f.v - g.v);
				203	}
				204
				205	template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) {
				206	return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a);
				207	}
				208
				209	template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) {
				210	return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a));
				211	}
				212
				213	template <int N> Jet<N> exp(const Jet<N>& f) {
				214	return Jet<T, N>(exp(f.a), exp(f.a) * f.v);
				215	}
				216
				217	// This is a simple implementation for illustration purposes, the
				218	// actual implementation of pow requires careful handling of a number
				219	// of corner cases.
				220	template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) {
				221	return Jet<N>(pow(f.a, g.a),
				222	g.a * pow(f.a, g.a - 1.0) * f.v +
				223	pow(f.a, g.a) * log(f.a); * g.v);
				224	}
				225
				226
				227	With these overloaded functions in hand, we can now call
				228	``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting
				229	that together with appropriately initialized Jets allows us to compute
				230	the Jacobian as follows:
				231
				232	.. code-block:: c++
				233
				234	class Rat43Automatic : public ceres::SizedCostFunction<1,4> {
				235	public:
				236	Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {}
				237	virtual ~Rat43Automatic() {}
				238	virtual bool Evaluate(double const* const* parameters,
				239	double* residuals,
				240	double** jacobians) const {
				241	// Just evaluate the residuals if Jacobians are not required.
				242	if (!jacobians) return (*functor_)(parameters[0], residuals);
				243
				244	// Initialize the Jets
				245	ceres::Jet<4> jets[4];
				246	for (int i = 0; i < 4; ++i) {
				247	jets[i].a = parameters[0][i];
				248	jets[i].v.setZero();
				249	jets[i].v[i] = 1.0;
				250	}
				251
				252	ceres::Jet<4> result;
				253	(*functor_)(jets, &result);
				254
				255	// Copy the values out of the Jet.
				256	residuals[0] = result.a;
				257	for (int i = 0; i < 4; ++i) {
				258	jacobians[0][i] = result.v[i];
				259	}
				260	return true;
				261	}
				262
				263	private:
				264	std::unique_ptr<const Rat43CostFunctor> functor_;
				265	};
				266
				267	Indeed, this is essentially how :class:`AutoDiffCostFunction` works.
				268
				269
				270	Pitfalls
				271	========
				272
				273	Automatic differentiation frees the user from the burden of computing
				274	and reasoning about the symbolic expressions for the Jacobians, but
				275	this freedom comes at a cost. For example consider the following
				276	simple functor:
				277
				278	.. code-block:: c++
				279
				280	struct Functor {
				281	template <typename T> bool operator()(const T* x, T* residual) const {
				282	residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]);
				283	return true;
				284	}
				285	};
				286
				287	Looking at the code for the residual computation, one does not foresee
				288	any problems. However, if we look at the analytical expressions for
				289	the Jacobian:
				290
				291	.. math::
				292
				293	y &= 1 - \sqrt{x_0^2 + x_1^2}\\
				294	D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\
				295	D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}}
				296
				297	we find that it is an indeterminate form at :math:`x_0 = 0, x_1 =
				298	0`.
				299
				300	There is no single solution to this problem. In some cases one needs
				301	to reason explicitly about the points where indeterminacy may occur
				302	and use alternate expressions using `L'Hopital's rule
				303	<https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for
				304	example some of the conversion routines in `rotation.h
				305	<https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In
				306	other cases, one may need to regularize the expressions to eliminate
				307	these points.