Blame - docs/source/analytical_derivatives.rst - RealtimeRoboticsGroup/test

blob: 2a3a4048244dd7166bd88387059039a50d9babec [file] [log] [blame]

Austin Schuh	70cc955	2019-01-21 19:46:48 -0800	[diff] [blame]	1	.. default-domain:: cpp
				2
				3	.. cpp:namespace:: ceres
				4
				5	.. _chapter-analytical_derivatives:
				6
				7	====================
				8	Analytic Derivatives
				9	====================
				10
				11	Consider the problem of fitting the following curve (`Rat43
				12	<http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_) to
				13	data:
				14
				15	.. math::
				16	y = \frac{b_1}{(1+e^{b_2-b_3x})^{1/b_4}}
				17
				18	That is, given some data :math:`\{x_i, y_i\},\ \forall i=1,... ,n`,
				19	determine parameters :math:`b_1, b_2, b_3` and :math:`b_4` that best
				20	fit this data.
				21
				22	Which can be stated as the problem of finding the
				23	values of :math:`b_1, b_2, b_3` and :math:`b_4` are the ones that
				24	minimize the following objective function [#f1]_:
				25
				26	.. math::
				27	\begin{align}
				28	E(b_1, b_2, b_3, b_4)
				29	&= \sum_i f^2(b_1, b_2, b_3, b_4 ; x_i, y_i)\\
				30	&= \sum_i \left(\frac{b_1}{(1+e^{b_2-b_3x_i})^{1/b_4}} - y_i\right)^2\\
				31	\end{align}
				32
				33	To solve this problem using Ceres Solver, we need to define a
				34	:class:`CostFunction` that computes the residual :math:`f` for a given
				35	:math:`x` and :math:`y` and its derivatives with respect to
				36	:math:`b_1, b_2, b_3` and :math:`b_4`.
				37
				38	Using elementary differential calculus, we can see that:
				39
				40	.. math::
				41	\begin{align}
				42	D_1 f(b_1, b_2, b_3, b_4; x,y) &= \frac{1}{(1+e^{b_2-b_3x})^{1/b_4}}\\
				43	D_2 f(b_1, b_2, b_3, b_4; x,y) &=
				44	\frac{-b_1e^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
				45	D_3 f(b_1, b_2, b_3, b_4; x,y) &=
				46	\frac{b_1xe^{b_2-b_3x}}{b_4(1+e^{b_2-b_3x})^{1/b_4 + 1}} \\
				47	D_4 f(b_1, b_2, b_3, b_4; x,y) & = \frac{b_1 \log\left(1+e^{b_2-b_3x}\right) }{b_4^2(1+e^{b_2-b_3x})^{1/b_4}}
				48	\end{align}
				49
				50	With these derivatives in hand, we can now implement the
				51	:class:`CostFunction` as:
				52
				53	.. code-block:: c++
				54
				55	class Rat43Analytic : public SizedCostFunction<1,4> {
				56	public:
				57	Rat43Analytic(const double x, const double y) : x_(x), y_(y) {}
				58	virtual ~Rat43Analytic() {}
				59	virtual bool Evaluate(double const* const* parameters,
				60	double* residuals,
				61	double** jacobians) const {
				62	const double b1 = parameters[0][0];
				63	const double b2 = parameters[0][1];
				64	const double b3 = parameters[0][2];
				65	const double b4 = parameters[0][3];
				66
				67	residuals[0] = b1 * pow(1 + exp(b2 - b3 * x_), -1.0 / b4) - y_;
				68
				69	if (!jacobians) return true;
				70	double* jacobian = jacobians[0];
				71	if (!jacobian) return true;
				72
				73	jacobian[0] = pow(1 + exp(b2 - b3 * x_), -1.0 / b4);
				74	jacobian[1] = -b1 * exp(b2 - b3 * x_) *
				75	pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
				76	jacobian[2] = x_ * b1 * exp(b2 - b3 * x_) *
				77	pow(1 + exp(b2 - b3 * x_), -1.0 / b4 - 1) / b4;
				78	jacobian[3] = b1 * log(1 + exp(b2 - b3 * x_)) *
				79	pow(1 + exp(b2 - b3 * x_), -1.0 / b4) / (b4 * b4);
				80	return true;
				81	}
				82
				83	private:
				84	const double x_;
				85	const double y_;
				86	};
				87
				88	This is tedious code, hard to read and with a lot of
				89	redundancy. So in practice we will cache some sub-expressions to
				90	improve its efficiency, which would give us something like:
				91
				92	.. code-block:: c++
				93
				94	class Rat43AnalyticOptimized : public SizedCostFunction<1,4> {
				95	public:
				96	Rat43AnalyticOptimized(const double x, const double y) : x_(x), y_(y) {}
				97	virtual ~Rat43AnalyticOptimized() {}
				98	virtual bool Evaluate(double const* const* parameters,
				99	double* residuals,
				100	double** jacobians) const {
				101	const double b1 = parameters[0][0];
				102	const double b2 = parameters[0][1];
				103	const double b3 = parameters[0][2];
				104	const double b4 = parameters[0][3];
				105
				106	const double t1 = exp(b2 - b3 * x_);
				107	const double t2 = 1 + t1;
				108	const double t3 = pow(t2, -1.0 / b4);
				109	residuals[0] = b1 * t3 - y_;
				110
				111	if (!jacobians) return true;
				112	double* jacobian = jacobians[0];
				113	if (!jacobian) return true;
				114
				115	const double t4 = pow(t2, -1.0 / b4 - 1);
				116	jacobian[0] = t3;
				117	jacobian[1] = -b1 * t1 * t4 / b4;
				118	jacobian[2] = -x_ * jacobian[1];
				119	jacobian[3] = b1 * log(t2) * t3 / (b4 * b4);
				120	return true;
				121	}
				122
				123	private:
				124	const double x_;
				125	const double y_;
				126	};
				127
				128	What is the difference in performance of these two implementations?
				129
				130	========================== =========
				131	CostFunction Time (ns)
				132	========================== =========
				133	Rat43Analytic 255
				134	Rat43AnalyticOptimized 92
				135	========================== =========
				136
				137	``Rat43AnalyticOptimized`` is :math:`2.8` times faster than
				138	``Rat43Analytic``. This difference in run-time is not uncommon. To
				139	get the best performance out of analytically computed derivatives, one
				140	usually needs to optimize the code to account for common
				141	sub-expressions.
				142
				143
				144	When should you use analytical derivatives?
				145	===========================================
				146
				147	#. The expressions are simple, e.g. mostly linear.
				148
				149	#. A computer algebra system like `Maple
				150	<https://www.maplesoft.com/products/maple/>`_ , `Mathematica
				151	<https://www.wolfram.com/mathematica/>`_, or `SymPy
				152	<http://www.sympy.org/en/index.html>`_ can be used to symbolically
				153	differentiate the objective function and generate the C++ to
				154	evaluate them.
				155
				156	#. Performance is of utmost concern and there is algebraic structure
				157	in the terms that you can exploit to get better performance than
				158	automatic differentiation.
				159
				160	That said, getting the best performance out of analytical
				161	derivatives requires a non-trivial amount of work. Before going
				162	down this path, it is useful to measure the amount of time being
				163	spent evaluating the Jacobian as a fraction of the total solve time
				164	and remember `Amdahl's Law
				165	<https://en.wikipedia.org/wiki/Amdahl's_law>`_ is your friend.
				166
				167	#. There is no other way to compute the derivatives, e.g. you
				168	wish to compute the derivative of the root of a polynomial:
				169
				170	.. math::
				171	a_3(x,y)z^3 + a_2(x,y)z^2 + a_1(x,y)z + a_0(x,y) = 0
				172
				173
				174	with respect to :math:`x` and :math:`y`. This requires the use of
				175	the `Inverse Function Theorem
				176	<https://en.wikipedia.org/wiki/Inverse_function_theorem>`_
				177
				178	#. You love the chain rule and actually enjoy doing all the algebra by
				179	hand.
				180
				181
				182	.. rubric:: Footnotes
				183
				184	.. [#f1] The notion of best fit depends on the choice of the objective
				185	function used to measure the quality of fit, which in turn
				186	depends on the underlying noise process which generated the
				187	observations. Minimizing the sum of squared differences is
				188	the right thing to do when the noise is `Gaussian
				189	<https://en.wikipedia.org/wiki/Normal_distribution>`_. In
				190	that case the optimal value of the parameters is the `Maximum
				191	Likelihood Estimate
				192	<https://en.wikipedia.org/wiki/Maximum_likelihood_estimation>`_.