Austin Schuh | 70cc955 | 2019-01-21 19:46:48 -0800 | [diff] [blame] | 1 | .. default-domain:: cpp |
| 2 | |
| 3 | .. cpp:namespace:: ceres |
| 4 | |
| 5 | .. _chapter-automatic_derivatives: |
| 6 | |
| 7 | ===================== |
| 8 | Automatic Derivatives |
| 9 | ===================== |
| 10 | |
| 11 | We will now consider automatic differentiation. It is a technique that |
| 12 | can compute exact derivatives, fast, while requiring about the same |
| 13 | effort from the user as is needed to use numerical differentiation. |
| 14 | |
| 15 | Don't believe me? Well here goes. The following code fragment |
| 16 | implements an automatically differentiated ``CostFunction`` for `Rat43 |
| 17 | <http://www.itl.nist.gov/div898/strd/nls/data/ratkowsky3.shtml>`_. |
| 18 | |
| 19 | .. code-block:: c++ |
| 20 | |
| 21 | struct Rat43CostFunctor { |
| 22 | Rat43CostFunctor(const double x, const double y) : x_(x), y_(y) {} |
| 23 | |
| 24 | template <typename T> |
| 25 | bool operator()(const T* parameters, T* residuals) const { |
| 26 | const T b1 = parameters[0]; |
| 27 | const T b2 = parameters[1]; |
| 28 | const T b3 = parameters[2]; |
| 29 | const T b4 = parameters[3]; |
| 30 | residuals[0] = b1 * pow(1.0 + exp(b2 - b3 * x_), -1.0 / b4) - y_; |
| 31 | return true; |
| 32 | } |
| 33 | |
| 34 | private: |
| 35 | const double x_; |
| 36 | const double y_; |
| 37 | }; |
| 38 | |
| 39 | |
| 40 | CostFunction* cost_function = |
| 41 | new AutoDiffCostFunction<Rat43CostFunctor, 1, 4>( |
| 42 | new Rat43CostFunctor(x, y)); |
| 43 | |
| 44 | Notice that compared to numeric differentiation, the only difference |
| 45 | when defining the functor for use with automatic differentiation is |
| 46 | the signature of the ``operator()``. |
| 47 | |
| 48 | In the case of numeric differentiation it was |
| 49 | |
| 50 | .. code-block:: c++ |
| 51 | |
| 52 | bool operator()(const double* parameters, double* residuals) const; |
| 53 | |
| 54 | and for automatic differentiation it is a templated function of the |
| 55 | form |
| 56 | |
| 57 | .. code-block:: c++ |
| 58 | |
| 59 | template <typename T> bool operator()(const T* parameters, T* residuals) const; |
| 60 | |
| 61 | |
| 62 | So what does this small change buy us? The following table compares |
| 63 | the time it takes to evaluate the residual and the Jacobian for |
| 64 | `Rat43` using various methods. |
| 65 | |
| 66 | ========================== ========= |
| 67 | CostFunction Time (ns) |
| 68 | ========================== ========= |
| 69 | Rat43Analytic 255 |
| 70 | Rat43AnalyticOptimized 92 |
| 71 | Rat43NumericDiffForward 262 |
| 72 | Rat43NumericDiffCentral 517 |
| 73 | Rat43NumericDiffRidders 3760 |
| 74 | Rat43AutomaticDiff 129 |
| 75 | ========================== ========= |
| 76 | |
| 77 | We can get exact derivatives using automatic differentiation |
| 78 | (``Rat43AutomaticDiff``) with about the same effort that is required |
| 79 | to write the code for numeric differentiation but only :math:`40\%` |
| 80 | slower than hand optimized analytical derivatives. |
| 81 | |
| 82 | So how does it work? For this we will have to learn about **Dual |
| 83 | Numbers** and **Jets** . |
| 84 | |
| 85 | |
| 86 | Dual Numbers & Jets |
| 87 | =================== |
| 88 | |
| 89 | .. NOTE:: |
| 90 | |
| 91 | Reading this and the next section on implementing Jets is not |
| 92 | necessary to use automatic differentiation in Ceres Solver. But |
| 93 | knowing the basics of how Jets work is useful when debugging and |
| 94 | reasoning about the performance of automatic differentiation. |
| 95 | |
| 96 | Dual numbers are an extension of the real numbers analogous to complex |
| 97 | numbers: whereas complex numbers augment the reals by introducing an |
| 98 | imaginary unit :math:`\iota` such that :math:`\iota^2 = -1`, dual |
| 99 | numbers introduce an *infinitesimal* unit :math:`\epsilon` such that |
| 100 | :math:`\epsilon^2 = 0` . A dual number :math:`a + v\epsilon` has two |
| 101 | components, the *real* component :math:`a` and the *infinitesimal* |
| 102 | component :math:`v`. |
| 103 | |
| 104 | Surprisingly, this simple change leads to a convenient method for |
| 105 | computing exact derivatives without needing to manipulate complicated |
| 106 | symbolic expressions. |
| 107 | |
| 108 | For example, consider the function |
| 109 | |
| 110 | .. math:: |
| 111 | |
| 112 | f(x) = x^2 , |
| 113 | |
| 114 | Then, |
| 115 | |
| 116 | .. math:: |
| 117 | |
| 118 | \begin{align} |
| 119 | f(10 + \epsilon) &= (10 + \epsilon)^2\\ |
| 120 | &= 100 + 20 \epsilon + \epsilon^2\\ |
| 121 | &= 100 + 20 \epsilon |
| 122 | \end{align} |
| 123 | |
| 124 | Observe that the coefficient of :math:`\epsilon` is :math:`Df(10) = |
| 125 | 20`. Indeed this generalizes to functions which are not |
| 126 | polynomial. Consider an arbitrary differentiable function |
| 127 | :math:`f(x)`. Then we can evaluate :math:`f(x + \epsilon)` by |
| 128 | considering the Taylor expansion of :math:`f` near :math:`x`, which |
| 129 | gives us the infinite series |
| 130 | |
| 131 | .. math:: |
| 132 | \begin{align} |
| 133 | f(x + \epsilon) &= f(x) + Df(x) \epsilon + D^2f(x) |
| 134 | \frac{\epsilon^2}{2} + D^3f(x) \frac{\epsilon^3}{6} + \cdots\\ |
| 135 | f(x + \epsilon) &= f(x) + Df(x) \epsilon |
| 136 | \end{align} |
| 137 | |
| 138 | Here we are using the fact that :math:`\epsilon^2 = 0`. |
| 139 | |
| 140 | A `Jet <https://en.wikipedia.org/wiki/Jet_(mathematics)>`_ is a |
| 141 | :math:`n`-dimensional dual number, where we augment the real numbers |
| 142 | with :math:`n` infinitesimal units :math:`\epsilon_i,\ i=1,...,n` with |
| 143 | the property that :math:`\forall i, j\ :\epsilon_i\epsilon_j = 0`. Then |
| 144 | a Jet consists of a *real* part :math:`a` and a :math:`n`-dimensional |
| 145 | *infinitesimal* part :math:`\mathbf{v}`, i.e., |
| 146 | |
| 147 | .. math:: |
| 148 | x = a + \sum_j v_{j} \epsilon_j |
| 149 | |
| 150 | The summation notation gets tedious, so we will also just write |
| 151 | |
| 152 | .. math:: |
| 153 | x = a + \mathbf{v}. |
| 154 | |
| 155 | where the :math:`\epsilon_i`'s are implicit. Then, using the same |
| 156 | Taylor series expansion used above, we can see that: |
| 157 | |
| 158 | .. math:: |
| 159 | |
| 160 | f(a + \mathbf{v}) = f(a) + Df(a) \mathbf{v}. |
| 161 | |
| 162 | Similarly for a multivariate function |
| 163 | :math:`f:\mathbb{R}^{n}\rightarrow \mathbb{R}^m`, evaluated on |
| 164 | :math:`x_i = a_i + \mathbf{v}_i,\ \forall i = 1,...,n`: |
| 165 | |
| 166 | .. math:: |
| 167 | f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \mathbf{v}_i |
| 168 | |
| 169 | So if each :math:`\mathbf{v}_i = e_i` were the :math:`i^{\text{th}}` |
| 170 | standard basis vector, then, the above expression would simplify to |
| 171 | |
| 172 | .. math:: |
| 173 | f(x_1,..., x_n) = f(a_1, ..., a_n) + \sum_i D_i f(a_1, ..., a_n) \epsilon_i |
| 174 | |
| 175 | and we can extract the coordinates of the Jacobian by inspecting the |
| 176 | coefficients of :math:`\epsilon_i`. |
| 177 | |
| 178 | Implementing Jets |
| 179 | ----------------- |
| 180 | |
| 181 | In order for the above to work in practice, we will need the ability |
| 182 | to evaluate an arbitrary function :math:`f` not just on real numbers |
| 183 | but also on dual numbers, but one does not usually evaluate functions |
| 184 | by evaluating their Taylor expansions, |
| 185 | |
| 186 | This is where C++ templates and operator overloading comes into |
| 187 | play. The following code fragment has a simple implementation of a |
| 188 | ``Jet`` and some operators/functions that operate on them. |
| 189 | |
| 190 | .. code-block:: c++ |
| 191 | |
| 192 | template<int N> struct Jet { |
| 193 | double a; |
| 194 | Eigen::Matrix<double, 1, N> v; |
| 195 | }; |
| 196 | |
| 197 | template<int N> Jet<N> operator+(const Jet<N>& f, const Jet<N>& g) { |
| 198 | return Jet<N>(f.a + g.a, f.v + g.v); |
| 199 | } |
| 200 | |
| 201 | template<int N> Jet<N> operator-(const Jet<N>& f, const Jet<N>& g) { |
| 202 | return Jet<N>(f.a - g.a, f.v - g.v); |
| 203 | } |
| 204 | |
| 205 | template<int N> Jet<N> operator*(const Jet<N>& f, const Jet<N>& g) { |
| 206 | return Jet<N>(f.a * g.a, f.a * g.v + f.v * g.a); |
| 207 | } |
| 208 | |
| 209 | template<int N> Jet<N> operator/(const Jet<N>& f, const Jet<N>& g) { |
| 210 | return Jet<N>(f.a / g.a, f.v / g.a - f.a * g.v / (g.a * g.a)); |
| 211 | } |
| 212 | |
| 213 | template <int N> Jet<N> exp(const Jet<N>& f) { |
| 214 | return Jet<T, N>(exp(f.a), exp(f.a) * f.v); |
| 215 | } |
| 216 | |
| 217 | // This is a simple implementation for illustration purposes, the |
| 218 | // actual implementation of pow requires careful handling of a number |
| 219 | // of corner cases. |
| 220 | template <int N> Jet<N> pow(const Jet<N>& f, const Jet<N>& g) { |
| 221 | return Jet<N>(pow(f.a, g.a), |
| 222 | g.a * pow(f.a, g.a - 1.0) * f.v + |
| 223 | pow(f.a, g.a) * log(f.a); * g.v); |
| 224 | } |
| 225 | |
| 226 | |
| 227 | With these overloaded functions in hand, we can now call |
| 228 | ``Rat43CostFunctor`` with an array of Jets instead of doubles. Putting |
| 229 | that together with appropriately initialized Jets allows us to compute |
| 230 | the Jacobian as follows: |
| 231 | |
| 232 | .. code-block:: c++ |
| 233 | |
| 234 | class Rat43Automatic : public ceres::SizedCostFunction<1,4> { |
| 235 | public: |
| 236 | Rat43Automatic(const Rat43CostFunctor* functor) : functor_(functor) {} |
| 237 | virtual ~Rat43Automatic() {} |
| 238 | virtual bool Evaluate(double const* const* parameters, |
| 239 | double* residuals, |
| 240 | double** jacobians) const { |
| 241 | // Just evaluate the residuals if Jacobians are not required. |
| 242 | if (!jacobians) return (*functor_)(parameters[0], residuals); |
| 243 | |
| 244 | // Initialize the Jets |
| 245 | ceres::Jet<4> jets[4]; |
| 246 | for (int i = 0; i < 4; ++i) { |
| 247 | jets[i].a = parameters[0][i]; |
| 248 | jets[i].v.setZero(); |
| 249 | jets[i].v[i] = 1.0; |
| 250 | } |
| 251 | |
| 252 | ceres::Jet<4> result; |
| 253 | (*functor_)(jets, &result); |
| 254 | |
| 255 | // Copy the values out of the Jet. |
| 256 | residuals[0] = result.a; |
| 257 | for (int i = 0; i < 4; ++i) { |
| 258 | jacobians[0][i] = result.v[i]; |
| 259 | } |
| 260 | return true; |
| 261 | } |
| 262 | |
| 263 | private: |
| 264 | std::unique_ptr<const Rat43CostFunctor> functor_; |
| 265 | }; |
| 266 | |
| 267 | Indeed, this is essentially how :class:`AutoDiffCostFunction` works. |
| 268 | |
Austin Schuh | 70cc955 | 2019-01-21 19:46:48 -0800 | [diff] [blame] | 269 | Pitfalls |
| 270 | ======== |
| 271 | |
| 272 | Automatic differentiation frees the user from the burden of computing |
| 273 | and reasoning about the symbolic expressions for the Jacobians, but |
| 274 | this freedom comes at a cost. For example consider the following |
| 275 | simple functor: |
| 276 | |
| 277 | .. code-block:: c++ |
| 278 | |
| 279 | struct Functor { |
| 280 | template <typename T> bool operator()(const T* x, T* residual) const { |
| 281 | residual[0] = 1.0 - sqrt(x[0] * x[0] + x[1] * x[1]); |
| 282 | return true; |
| 283 | } |
| 284 | }; |
| 285 | |
| 286 | Looking at the code for the residual computation, one does not foresee |
| 287 | any problems. However, if we look at the analytical expressions for |
| 288 | the Jacobian: |
| 289 | |
| 290 | .. math:: |
| 291 | |
| 292 | y &= 1 - \sqrt{x_0^2 + x_1^2}\\ |
| 293 | D_1y &= -\frac{x_0}{\sqrt{x_0^2 + x_1^2}},\ |
| 294 | D_2y = -\frac{x_1}{\sqrt{x_0^2 + x_1^2}} |
| 295 | |
| 296 | we find that it is an indeterminate form at :math:`x_0 = 0, x_1 = |
| 297 | 0`. |
| 298 | |
| 299 | There is no single solution to this problem. In some cases one needs |
| 300 | to reason explicitly about the points where indeterminacy may occur |
| 301 | and use alternate expressions using `L'Hopital's rule |
| 302 | <https://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule>`_ (see for |
| 303 | example some of the conversion routines in `rotation.h |
| 304 | <https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/rotation.h>`_. In |
| 305 | other cases, one may need to regularize the expressions to eliminate |
| 306 | these points. |