Blame - third_party/eigen/doc/HiPerformance.dox - RealtimeRoboticsGroup/test

blob: ab6cdfd44888072e9b0cd55324f86dcbabfc99e9 [file] [log] [blame]

Brian Silverman	72890c2	2015-09-19 14:37:37 -0400	[diff] [blame]	1
				2	namespace Eigen {
				3
				4	/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions
				5
				6	In general achieving good performance with Eigen does no require any special effort:
				7	simply write your expressions in the most high level way. This is especially true
				8	for small fixed size matrices. For large matrices, however, it might be useful to
				9	take some care when writing your expressions in order to minimize useless evaluations
				10	and optimize the performance.
				11	In this page we will give a brief overview of the Eigen's internal mechanism to simplify
				12	and evaluate complex product expressions, and discuss the current limitations.
				13	In particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
				14	all kind of matrix products and triangular solvers.
				15
				16	Indeed, in Eigen we have implemented a set of highly optimized routines which are very similar
				17	to BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
				18	natural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
				19	Given an expression, the challenge is then to map it to a minimal set of routines.
				20	As explained latter, this mechanism has some limitations, and knowing them will allow
				21	you to write faster code by making your expressions more Eigen friendly.
				22
				23	\section GEMM General Matrix-Matrix product (GEMM)
				24
				25	Let's start with the most common primitive: the matrix product of general dense matrices.
				26	In the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
				27	perform the following operation:
				28	\f$ C.noalias() += \alpha op1(A) op2(B) \f$
				29	where A, B, and C are column and/or row major matrices (or sub-matrices),
				30	alpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
				31	When Eigen detects a matrix product, it analyzes both sides of the product to extract a
				32	unique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
				33	More precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
				34	negation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
				35	and shape. All other expressions are immediately evaluated.
				36	For instance, the following expression:
				37	\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3m3).conjugate()s2)) \endcode
				38	is automatically simplified to:
				39	\code m1.noalias() += (s1s2conj(s3)s4) m2.adjoint() * m3.conjugate() \endcode
				40	which exactly matches our GEMM routine.
				41
				42	\subsection GEMM_Limitations Limitations
				43	Unfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
				44	handled by a single GEMM-like call are correctly detected.
				45	<table class="manual" style="width:100%">
				46	<tr>
				47	<th>Not optimal expression</th>
				48	<th>Evaluated as</th>
				49	<th>Optimal version (single evaluation)</th>
				50	<th>Comments</th>
				51	</tr>
				52	<tr>
				53	<td>\code
				54	m1 += m2 * m3; \endcode</td>
				55	<td>\code
				56	temp = m2 * m3;
				57	m1 += temp; \endcode</td>
				58	<td>\code
				59	m1.noalias() += m2 * m3; \endcode</td>
				60	<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias.
				61	Otherwise the product m2 * m3 is evaluated into a temporary.</td>
				62	</tr>
				63	<tr class="alt">
				64	<td></td>
				65	<td></td>
				66	<td>\code
				67	m1.noalias() += s1 * (m2 * m3); \endcode</td>
				68	<td>This is a special feature of Eigen. Here the product between a scalar
				69	and a matrix product does not evaluate the matrix product but instead it
				70	returns a matrix product expression tracking the scalar scaling factor. <br>
				71	Without this optimization, the matrix product would be evaluated into a
				72	temporary as in the next example.</td>
				73	</tr>
				74	<tr>
				75	<td>\code
				76	m1.noalias() += (m2 * m3).adjoint(); \endcode</td>
				77	<td>\code
				78	temp = m2 * m3;
				79	m1 += temp.adjoint(); \endcode</td>
				80	<td>\code
				81	m1.noalias() += m3.adjoint()
				82	* * m2.adjoint(); \endcode</td>
				83	<td>This is because the product expression has the EvalBeforeNesting bit which
				84	enforces the evaluation of the product by the Tranpose expression.</td>
				85	</tr>
				86	<tr class="alt">
				87	<td>\code
				88	m1 = m1 + m2 * m3; \endcode</td>
				89	<td>\code
				90	temp = m2 * m3;
				91	m1 = m1 + temp; \endcode</td>
				92	<td>\code m1.noalias() += m2 * m3; \endcode</td>
				93	<td>Here there is no way to detect at compile time that the two m1 are the same,
				94	and so the matrix product will be immediately evaluated.</td>
				95	</tr>
				96	<tr>
				97	<td>\code
				98	m1.noalias() = m4 + m2 * m3; \endcode</td>
				99	<td>\code
				100	temp = m2 * m3;
				101	m1 = m4 + temp; \endcode</td>
				102	<td>\code
				103	m1 = m4;
				104	m1.noalias() += m2 * m3; \endcode</td>
				105	<td>First of all, here the .noalias() in the first expression is useless because
				106	m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
				107	so that no temporary is required. (tip: for very small fixed size matrix
				108	it is slighlty better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
				109	</tr>
				110	<tr class="alt">
				111	<td>\code
				112	m1.noalias() += (s1m2).block(..) m3; \endcode</td>
				113	<td>\code
				114	temp = (s1*m2).block(..);
				115	m1 += temp * m3; \endcode</td>
				116	<td>\code
				117	m1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
				118	<td>This is because our expression analyzer is currently not able to extract trivial
				119	expressions nested in a Block expression. Therefore the nested scalar
				120	multiple cannot be properly extracted.</td>
				121	</tr>
				122	</table>
				123
				124	Of course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.
				125
				126	*/
				127
				128	}