Brian Silverman | 72890c2 | 2015-09-19 14:37:37 -0400 | [diff] [blame] | 1 | |
| 2 | namespace Eigen { |
| 3 | |
| 4 | /** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions |
| 5 | |
| 6 | In general achieving good performance with Eigen does no require any special effort: |
| 7 | simply write your expressions in the most high level way. This is especially true |
| 8 | for small fixed size matrices. For large matrices, however, it might be useful to |
| 9 | take some care when writing your expressions in order to minimize useless evaluations |
| 10 | and optimize the performance. |
| 11 | In this page we will give a brief overview of the Eigen's internal mechanism to simplify |
| 12 | and evaluate complex product expressions, and discuss the current limitations. |
| 13 | In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e, |
| 14 | all kind of matrix products and triangular solvers. |
| 15 | |
| 16 | Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar |
| 17 | to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and |
| 18 | natural API. Each of these routines can compute in a single evaluation a wide variety of expressions. |
| 19 | Given an expression, the challenge is then to map it to a minimal set of routines. |
| 20 | As explained latter, this mechanism has some limitations, and knowing them will allow |
| 21 | you to write faster code by making your expressions more Eigen friendly. |
| 22 | |
| 23 | \section GEMM General Matrix-Matrix product (GEMM) |
| 24 | |
| 25 | Let's start with the most common primitive: the matrix product of general dense matrices. |
| 26 | In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can |
| 27 | perform the following operation: |
| 28 | \f$ C.noalias() += \alpha op1(A) op2(B) \f$ |
| 29 | where A, B, and C are column and/or row major matrices (or sub-matrices), |
| 30 | alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity. |
| 31 | When Eigen detects a matrix product, it analyzes both sides of the product to extract a |
| 32 | unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states. |
| 33 | More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple, |
| 34 | negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order |
| 35 | and shape. All other expressions are immediately evaluated. |
| 36 | For instance, the following expression: |
| 37 | \code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2)) \endcode |
| 38 | is automatically simplified to: |
| 39 | \code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode |
| 40 | which exactly matches our GEMM routine. |
| 41 | |
| 42 | \subsection GEMM_Limitations Limitations |
| 43 | Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be |
| 44 | handled by a single GEMM-like call are correctly detected. |
| 45 | <table class="manual" style="width:100%"> |
| 46 | <tr> |
| 47 | <th>Not optimal expression</th> |
| 48 | <th>Evaluated as</th> |
| 49 | <th>Optimal version (single evaluation)</th> |
| 50 | <th>Comments</th> |
| 51 | </tr> |
| 52 | <tr> |
| 53 | <td>\code |
| 54 | m1 += m2 * m3; \endcode</td> |
| 55 | <td>\code |
| 56 | temp = m2 * m3; |
| 57 | m1 += temp; \endcode</td> |
| 58 | <td>\code |
| 59 | m1.noalias() += m2 * m3; \endcode</td> |
| 60 | <td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias. |
| 61 | Otherwise the product m2 * m3 is evaluated into a temporary.</td> |
| 62 | </tr> |
| 63 | <tr class="alt"> |
| 64 | <td></td> |
| 65 | <td></td> |
| 66 | <td>\code |
| 67 | m1.noalias() += s1 * (m2 * m3); \endcode</td> |
| 68 | <td>This is a special feature of Eigen. Here the product between a scalar |
| 69 | and a matrix product does not evaluate the matrix product but instead it |
| 70 | returns a matrix product expression tracking the scalar scaling factor. <br> |
| 71 | Without this optimization, the matrix product would be evaluated into a |
| 72 | temporary as in the next example.</td> |
| 73 | </tr> |
| 74 | <tr> |
| 75 | <td>\code |
| 76 | m1.noalias() += (m2 * m3).adjoint(); \endcode</td> |
| 77 | <td>\code |
| 78 | temp = m2 * m3; |
| 79 | m1 += temp.adjoint(); \endcode</td> |
| 80 | <td>\code |
| 81 | m1.noalias() += m3.adjoint() |
| 82 | * * m2.adjoint(); \endcode</td> |
| 83 | <td>This is because the product expression has the EvalBeforeNesting bit which |
| 84 | enforces the evaluation of the product by the Tranpose expression.</td> |
| 85 | </tr> |
| 86 | <tr class="alt"> |
| 87 | <td>\code |
| 88 | m1 = m1 + m2 * m3; \endcode</td> |
| 89 | <td>\code |
| 90 | temp = m2 * m3; |
| 91 | m1 = m1 + temp; \endcode</td> |
| 92 | <td>\code m1.noalias() += m2 * m3; \endcode</td> |
| 93 | <td>Here there is no way to detect at compile time that the two m1 are the same, |
| 94 | and so the matrix product will be immediately evaluated.</td> |
| 95 | </tr> |
| 96 | <tr> |
| 97 | <td>\code |
| 98 | m1.noalias() = m4 + m2 * m3; \endcode</td> |
| 99 | <td>\code |
| 100 | temp = m2 * m3; |
| 101 | m1 = m4 + temp; \endcode</td> |
| 102 | <td>\code |
| 103 | m1 = m4; |
| 104 | m1.noalias() += m2 * m3; \endcode</td> |
| 105 | <td>First of all, here the .noalias() in the first expression is useless because |
| 106 | m2*m3 will be evaluated anyway. However, note how this expression can be rewritten |
| 107 | so that no temporary is required. (tip: for very small fixed size matrix |
| 108 | it is slighlty better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td> |
| 109 | </tr> |
| 110 | <tr class="alt"> |
| 111 | <td>\code |
| 112 | m1.noalias() += (s1*m2).block(..) * m3; \endcode</td> |
| 113 | <td>\code |
| 114 | temp = (s1*m2).block(..); |
| 115 | m1 += temp * m3; \endcode</td> |
| 116 | <td>\code |
| 117 | m1.noalias() += s1 * m2.block(..) * m3; \endcode</td> |
| 118 | <td>This is because our expression analyzer is currently not able to extract trivial |
| 119 | expressions nested in a Block expression. Therefore the nested scalar |
| 120 | multiple cannot be properly extracted.</td> |
| 121 | </tr> |
| 122 | </table> |
| 123 | |
| 124 | Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices. |
| 125 | |
| 126 | */ |
| 127 | |
| 128 | } |