Austin Schuh | 70cc955 | 2019-01-21 19:46:48 -0800 | [diff] [blame^] | 1 | |
| 2 | .. default-domain:: cpp |
| 3 | |
| 4 | .. cpp:namespace:: ceres |
| 5 | |
| 6 | .. _chapter-nnls_covariance: |
| 7 | |
| 8 | ===================== |
| 9 | Covariance Estimation |
| 10 | ===================== |
| 11 | |
| 12 | Introduction |
| 13 | ============ |
| 14 | |
| 15 | One way to assess the quality of the solution returned by a non-linear |
| 16 | least squares solver is to analyze the covariance of the solution. |
| 17 | |
| 18 | Let us consider the non-linear regression problem |
| 19 | |
| 20 | .. math:: y = f(x) + N(0, I) |
| 21 | |
| 22 | i.e., the observation :math:`y` is a random non-linear function of the |
| 23 | independent variable :math:`x` with mean :math:`f(x)` and identity |
| 24 | covariance. Then the maximum likelihood estimate of :math:`x` given |
| 25 | observations :math:`y` is the solution to the non-linear least squares |
| 26 | problem: |
| 27 | |
| 28 | .. math:: x^* = \arg \min_x \|f(x)\|^2 |
| 29 | |
| 30 | And the covariance of :math:`x^*` is given by |
| 31 | |
| 32 | .. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{-1} |
| 33 | |
| 34 | Here :math:`J(x^*)` is the Jacobian of :math:`f` at :math:`x^*`. The |
| 35 | above formula assumes that :math:`J(x^*)` has full column rank. |
| 36 | |
| 37 | If :math:`J(x^*)` is rank deficient, then the covariance matrix :math:`C(x^*)` |
| 38 | is also rank deficient and is given by the Moore-Penrose pseudo inverse. |
| 39 | |
| 40 | .. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{\dagger} |
| 41 | |
| 42 | Note that in the above, we assumed that the covariance matrix for |
| 43 | :math:`y` was identity. This is an important assumption. If this is |
| 44 | not the case and we have |
| 45 | |
| 46 | .. math:: y = f(x) + N(0, S) |
| 47 | |
| 48 | Where :math:`S` is a positive semi-definite matrix denoting the |
| 49 | covariance of :math:`y`, then the maximum likelihood problem to be |
| 50 | solved is |
| 51 | |
| 52 | .. math:: x^* = \arg \min_x f'(x) S^{-1} f(x) |
| 53 | |
| 54 | and the corresponding covariance estimate of :math:`x^*` is given by |
| 55 | |
| 56 | .. math:: C(x^*) = \left(J'(x^*) S^{-1} J(x^*)\right)^{-1} |
| 57 | |
| 58 | So, if it is the case that the observations being fitted to have a |
| 59 | covariance matrix not equal to identity, then it is the user's |
| 60 | responsibility that the corresponding cost functions are correctly |
| 61 | scaled, e.g. in the above case the cost function for this problem |
| 62 | should evaluate :math:`S^{-1/2} f(x)` instead of just :math:`f(x)`, |
| 63 | where :math:`S^{-1/2}` is the inverse square root of the covariance |
| 64 | matrix :math:`S`. |
| 65 | |
| 66 | Gauge Invariance |
| 67 | ================ |
| 68 | |
| 69 | In structure from motion (3D reconstruction) problems, the |
| 70 | reconstruction is ambiguous up to a similarity transform. This is |
| 71 | known as a *Gauge Ambiguity*. Handling Gauges correctly requires the |
| 72 | use of SVD or custom inversion algorithms. For small problems the |
| 73 | user can use the dense algorithm. For more details see the work of |
| 74 | Kanatani & Morris [KanataniMorris]_. |
| 75 | |
| 76 | |
| 77 | :class:`Covariance` |
| 78 | =================== |
| 79 | |
| 80 | :class:`Covariance` allows the user to evaluate the covariance for a |
| 81 | non-linear least squares problem and provides random access to its |
| 82 | blocks. The computation assumes that the cost functions compute |
| 83 | residuals such that their covariance is identity. |
| 84 | |
| 85 | Since the computation of the covariance matrix requires computing the |
| 86 | inverse of a potentially large matrix, this can involve a rather large |
| 87 | amount of time and memory. However, it is usually the case that the |
| 88 | user is only interested in a small part of the covariance |
| 89 | matrix. Quite often just the block diagonal. :class:`Covariance` |
| 90 | allows the user to specify the parts of the covariance matrix that she |
| 91 | is interested in and then uses this information to only compute and |
| 92 | store those parts of the covariance matrix. |
| 93 | |
| 94 | Rank of the Jacobian |
| 95 | ==================== |
| 96 | |
| 97 | As we noted above, if the Jacobian is rank deficient, then the inverse |
| 98 | of :math:`J'J` is not defined and instead a pseudo inverse needs to be |
| 99 | computed. |
| 100 | |
| 101 | The rank deficiency in :math:`J` can be *structural* -- columns |
| 102 | which are always known to be zero or *numerical* -- depending on the |
| 103 | exact values in the Jacobian. |
| 104 | |
| 105 | Structural rank deficiency occurs when the problem contains parameter |
| 106 | blocks that are constant. This class correctly handles structural rank |
| 107 | deficiency like that. |
| 108 | |
| 109 | Numerical rank deficiency, where the rank of the matrix cannot be |
| 110 | predicted by its sparsity structure and requires looking at its |
| 111 | numerical values is more complicated. Here again there are two |
| 112 | cases. |
| 113 | |
| 114 | a. The rank deficiency arises from overparameterization. e.g., a |
| 115 | four dimensional quaternion used to parameterize :math:`SO(3)`, |
| 116 | which is a three dimensional manifold. In cases like this, the |
| 117 | user should use an appropriate |
| 118 | :class:`LocalParameterization`. Not only will this lead to better |
| 119 | numerical behaviour of the Solver, it will also expose the rank |
| 120 | deficiency to the :class:`Covariance` object so that it can |
| 121 | handle it correctly. |
| 122 | |
| 123 | b. More general numerical rank deficiency in the Jacobian requires |
| 124 | the computation of the so called Singular Value Decomposition |
| 125 | (SVD) of :math:`J'J`. We do not know how to do this for large |
| 126 | sparse matrices efficiently. For small and moderate sized |
| 127 | problems this is done using dense linear algebra. |
| 128 | |
| 129 | |
| 130 | :class:`Covariance::Options` |
| 131 | |
| 132 | .. class:: Covariance::Options |
| 133 | |
| 134 | .. member:: int Covariance::Options::num_threads |
| 135 | |
| 136 | Default: ``1`` |
| 137 | |
| 138 | Number of threads to be used for evaluating the Jacobian and |
| 139 | estimation of covariance. |
| 140 | |
| 141 | .. member:: SparseLinearAlgebraLibraryType Covariance::Options::sparse_linear_algebra_library_type |
| 142 | |
| 143 | Default: ``SUITE_SPARSE`` Ceres Solver is built with support for |
| 144 | `SuiteSparse <http://faculty.cse.tamu.edu/davis/suitesparse.html>`_ |
| 145 | and ``EIGEN_SPARSE`` otherwise. Note that ``EIGEN_SPARSE`` is |
| 146 | always available. |
| 147 | |
| 148 | .. member:: CovarianceAlgorithmType Covariance::Options::algorithm_type |
| 149 | |
| 150 | Default: ``SPARSE_QR`` |
| 151 | |
| 152 | Ceres supports two different algorithms for covariance estimation, |
| 153 | which represent different tradeoffs in speed, accuracy and |
| 154 | reliability. |
| 155 | |
| 156 | 1. ``SPARSE_QR`` uses the sparse QR factorization algorithm to |
| 157 | compute the decomposition |
| 158 | |
| 159 | .. math:: |
| 160 | |
| 161 | QR &= J\\ |
| 162 | \left(J^\top J\right)^{-1} &= \left(R^\top R\right)^{-1} |
| 163 | |
| 164 | The speed of this algorithm depends on the sparse linear algebra |
| 165 | library being used. ``Eigen``'s sparse QR factorization is a |
| 166 | moderately fast algorithm suitable for small to medium sized |
| 167 | matrices. For best performance we recommend using |
| 168 | ``SuiteSparseQR`` which is enabled by setting |
| 169 | :member:`Covaraince::Options::sparse_linear_algebra_library_type` |
| 170 | to ``SUITE_SPARSE``. |
| 171 | |
| 172 | Neither ``SPARSE_QR`` cannot compute the covariance if the |
| 173 | Jacobian is rank deficient. |
| 174 | |
| 175 | |
| 176 | 2. ``DENSE_SVD`` uses ``Eigen``'s ``JacobiSVD`` to perform the |
| 177 | computations. It computes the singular value decomposition |
| 178 | |
| 179 | .. math:: U S V^\top = J |
| 180 | |
| 181 | and then uses it to compute the pseudo inverse of J'J as |
| 182 | |
| 183 | .. math:: (J'J)^{\dagger} = V S^{\dagger} V^\top |
| 184 | |
| 185 | It is an accurate but slow method and should only be used for |
| 186 | small to moderate sized problems. It can handle full-rank as |
| 187 | well as rank deficient Jacobians. |
| 188 | |
| 189 | |
| 190 | .. member:: int Covariance::Options::min_reciprocal_condition_number |
| 191 | |
| 192 | Default: :math:`10^{-14}` |
| 193 | |
| 194 | If the Jacobian matrix is near singular, then inverting :math:`J'J` |
| 195 | will result in unreliable results, e.g, if |
| 196 | |
| 197 | .. math:: |
| 198 | |
| 199 | J = \begin{bmatrix} |
| 200 | 1.0& 1.0 \\ |
| 201 | 1.0& 1.0000001 |
| 202 | \end{bmatrix} |
| 203 | |
| 204 | which is essentially a rank deficient matrix, we have |
| 205 | |
| 206 | .. math:: |
| 207 | |
| 208 | (J'J)^{-1} = \begin{bmatrix} |
| 209 | 2.0471e+14& -2.0471e+14 \\ |
| 210 | -2.0471e+14 2.0471e+14 |
| 211 | \end{bmatrix} |
| 212 | |
| 213 | |
| 214 | This is not a useful result. Therefore, by default |
| 215 | :func:`Covariance::Compute` will return ``false`` if a rank |
| 216 | deficient Jacobian is encountered. How rank deficiency is detected |
| 217 | depends on the algorithm being used. |
| 218 | |
| 219 | 1. ``DENSE_SVD`` |
| 220 | |
| 221 | .. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}} < \sqrt{\text{min_reciprocal_condition_number}} |
| 222 | |
| 223 | where :math:`\sigma_{\text{min}}` and |
| 224 | :math:`\sigma_{\text{max}}` are the minimum and maxiumum |
| 225 | singular values of :math:`J` respectively. |
| 226 | |
| 227 | 2. ``SPARSE_QR`` |
| 228 | |
| 229 | .. math:: \operatorname{rank}(J) < \operatorname{num\_col}(J) |
| 230 | |
| 231 | Here :math:`\operatorname{rank}(J)` is the estimate of the rank |
| 232 | of :math:`J` returned by the sparse QR factorization |
| 233 | algorithm. It is a fairly reliable indication of rank |
| 234 | deficiency. |
| 235 | |
| 236 | .. member:: int Covariance::Options::null_space_rank |
| 237 | |
| 238 | When using ``DENSE_SVD``, the user has more control in dealing |
| 239 | with singular and near singular covariance matrices. |
| 240 | |
| 241 | As mentioned above, when the covariance matrix is near singular, |
| 242 | instead of computing the inverse of :math:`J'J`, the Moore-Penrose |
| 243 | pseudoinverse of :math:`J'J` should be computed. |
| 244 | |
| 245 | If :math:`J'J` has the eigen decomposition :math:`(\lambda_i, |
| 246 | e_i)`, where :math:`\lambda_i` is the :math:`i^\textrm{th}` |
| 247 | eigenvalue and :math:`e_i` is the corresponding eigenvector, then |
| 248 | the inverse of :math:`J'J` is |
| 249 | |
| 250 | .. math:: (J'J)^{-1} = \sum_i \frac{1}{\lambda_i} e_i e_i' |
| 251 | |
| 252 | and computing the pseudo inverse involves dropping terms from this |
| 253 | sum that correspond to small eigenvalues. |
| 254 | |
| 255 | How terms are dropped is controlled by |
| 256 | `min_reciprocal_condition_number` and `null_space_rank`. |
| 257 | |
| 258 | If `null_space_rank` is non-negative, then the smallest |
| 259 | `null_space_rank` eigenvalue/eigenvectors are dropped irrespective |
| 260 | of the magnitude of :math:`\lambda_i`. If the ratio of the |
| 261 | smallest non-zero eigenvalue to the largest eigenvalue in the |
| 262 | truncated matrix is still below min_reciprocal_condition_number, |
| 263 | then the `Covariance::Compute()` will fail and return `false`. |
| 264 | |
| 265 | Setting `null_space_rank = -1` drops all terms for which |
| 266 | |
| 267 | .. math:: \frac{\lambda_i}{\lambda_{\textrm{max}}} < \textrm{min_reciprocal_condition_number} |
| 268 | |
| 269 | This option has no effect on ``SPARSE_QR``. |
| 270 | |
| 271 | .. member:: bool Covariance::Options::apply_loss_function |
| 272 | |
| 273 | Default: `true` |
| 274 | |
| 275 | Even though the residual blocks in the problem may contain loss |
| 276 | functions, setting ``apply_loss_function`` to false will turn off |
| 277 | the application of the loss function to the output of the cost |
| 278 | function and in turn its effect on the covariance. |
| 279 | |
| 280 | .. class:: Covariance |
| 281 | |
| 282 | :class:`Covariance::Options` as the name implies is used to control |
| 283 | the covariance estimation algorithm. Covariance estimation is a |
| 284 | complicated and numerically sensitive procedure. Please read the |
| 285 | entire documentation for :class:`Covariance::Options` before using |
| 286 | :class:`Covariance`. |
| 287 | |
| 288 | .. function:: bool Covariance::Compute(const vector<pair<const double*, const double*> >& covariance_blocks, Problem* problem) |
| 289 | |
| 290 | Compute a part of the covariance matrix. |
| 291 | |
| 292 | The vector ``covariance_blocks``, indexes into the covariance |
| 293 | matrix block-wise using pairs of parameter blocks. This allows the |
| 294 | covariance estimation algorithm to only compute and store these |
| 295 | blocks. |
| 296 | |
| 297 | Since the covariance matrix is symmetric, if the user passes |
| 298 | ``<block1, block2>``, then ``GetCovarianceBlock`` can be called with |
| 299 | ``block1``, ``block2`` as well as ``block2``, ``block1``. |
| 300 | |
| 301 | ``covariance_blocks`` cannot contain duplicates. Bad things will |
| 302 | happen if they do. |
| 303 | |
| 304 | Note that the list of ``covariance_blocks`` is only used to |
| 305 | determine what parts of the covariance matrix are computed. The |
| 306 | full Jacobian is used to do the computation, i.e. they do not have |
| 307 | an impact on what part of the Jacobian is used for computation. |
| 308 | |
| 309 | The return value indicates the success or failure of the covariance |
| 310 | computation. Please see the documentation for |
| 311 | :class:`Covariance::Options` for more on the conditions under which |
| 312 | this function returns ``false``. |
| 313 | |
| 314 | .. function:: bool GetCovarianceBlock(const double* parameter_block1, const double* parameter_block2, double* covariance_block) const |
| 315 | |
| 316 | Return the block of the cross-covariance matrix corresponding to |
| 317 | ``parameter_block1`` and ``parameter_block2``. |
| 318 | |
| 319 | Compute must be called before the first call to ``GetCovarianceBlock`` |
| 320 | and the pair ``<parameter_block1, parameter_block2>`` OR the pair |
| 321 | ``<parameter_block2, parameter_block1>`` must have been present in the |
| 322 | vector covariance_blocks when ``Compute`` was called. Otherwise |
| 323 | ``GetCovarianceBlock`` will return false. |
| 324 | |
| 325 | ``covariance_block`` must point to a memory location that can store |
| 326 | a ``parameter_block1_size x parameter_block2_size`` matrix. The |
| 327 | returned covariance will be a row-major matrix. |
| 328 | |
| 329 | .. function:: bool GetCovarianceBlockInTangentSpace(const double* parameter_block1, const double* parameter_block2, double* covariance_block) const |
| 330 | |
| 331 | Return the block of the cross-covariance matrix corresponding to |
| 332 | ``parameter_block1`` and ``parameter_block2``. |
| 333 | Returns cross-covariance in the tangent space if a local |
| 334 | parameterization is associated with either parameter block; |
| 335 | else returns cross-covariance in the ambient space. |
| 336 | |
| 337 | Compute must be called before the first call to ``GetCovarianceBlock`` |
| 338 | and the pair ``<parameter_block1, parameter_block2>`` OR the pair |
| 339 | ``<parameter_block2, parameter_block1>`` must have been present in the |
| 340 | vector covariance_blocks when ``Compute`` was called. Otherwise |
| 341 | ``GetCovarianceBlock`` will return false. |
| 342 | |
| 343 | ``covariance_block`` must point to a memory location that can store |
| 344 | a ``parameter_block1_local_size x parameter_block2_local_size`` matrix. The |
| 345 | returned covariance will be a row-major matrix. |
| 346 | |
| 347 | Example Usage |
| 348 | ============= |
| 349 | |
| 350 | .. code-block:: c++ |
| 351 | |
| 352 | double x[3]; |
| 353 | double y[2]; |
| 354 | |
| 355 | Problem problem; |
| 356 | problem.AddParameterBlock(x, 3); |
| 357 | problem.AddParameterBlock(y, 2); |
| 358 | <Build Problem> |
| 359 | <Solve Problem> |
| 360 | |
| 361 | Covariance::Options options; |
| 362 | Covariance covariance(options); |
| 363 | |
| 364 | vector<pair<const double*, const double*> > covariance_blocks; |
| 365 | covariance_blocks.push_back(make_pair(x, x)); |
| 366 | covariance_blocks.push_back(make_pair(y, y)); |
| 367 | covariance_blocks.push_back(make_pair(x, y)); |
| 368 | |
| 369 | CHECK(covariance.Compute(covariance_blocks, &problem)); |
| 370 | |
| 371 | double covariance_xx[3 * 3]; |
| 372 | double covariance_yy[2 * 2]; |
| 373 | double covariance_xy[3 * 2]; |
| 374 | covariance.GetCovarianceBlock(x, x, covariance_xx) |
| 375 | covariance.GetCovarianceBlock(y, y, covariance_yy) |
| 376 | covariance.GetCovarianceBlock(x, y, covariance_xy) |