Blame - third_party/gmp/doc/projects.html - RealtimeRoboticsGroup/test

blob: 4a105142b0e5fe636633081f680e2a0e7a3956c5 [file] [log] [blame]

Austin Schuh	dace2a6	2020-08-18 10:56:48 -0700	[diff] [blame^]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
				2	<html>
				3	<head>
				4	<title>GMP Development Projects</title>
				5	<link rel="shortcut icon" href="favicon.ico">
				6	<link rel="stylesheet" href="gmp.css">
				7	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
				8	</head>
				9
				10	<center>
				11	<h1>
				12	GMP Development Projects
				13	</h1>
				14	</center>
				15
				16	<font size=-1>
				17	<pre>
				18	Copyright 2000-2006, 2008-2011 Free Software Foundation, Inc.
				19
				20	This file is part of the GNU MP Library.
				21
				22	The GNU MP Library is free software; you can redistribute it and/or modify
				23	it under the terms of either:
				24
				25	* the GNU Lesser General Public License as published by the Free
				26	Software Foundation; either version 3 of the License, or (at your
				27	option) any later version.
				28
				29	or
				30
				31	* the GNU General Public License as published by the Free Software
				32	Foundation; either version 2 of the License, or (at your option) any
				33	later version.
				34
				35	or both in parallel, as here.
				36
				37	The GNU MP Library is distributed in the hope that it will be useful, but
				38	WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
				39	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
				40	for more details.
				41
				42	You should have received copies of the GNU General Public License and the
				43	GNU Lesser General Public License along with the GNU MP Library. If not,
				44	see https://www.gnu.org/licenses/.
				45	</pre>
				46	</font>
				47
				48	<hr>
				49	<!-- NB. timestamp updated automatically by emacs -->
				50	This file current as of 29 Jan 2014. An up-to-date version is available at
				51	<a href="https://gmplib.org/projects.html">https://gmplib.org/projects.html</a>.
				52	Please send comments about this page to gmp-devel<font>@</font>gmplib.org.
				53
				54	<p> This file lists projects suitable for volunteers. Please see the
				55	<a href="tasks.html">tasks file</a> for smaller tasks.
				56
				57	<p> If you want to work on any of the projects below, please let
				58	gmp-devel<font>@</font>gmplib.org know. If you want to help with a project
				59	that already somebody else is working on, you will get in touch through
				60	gmp-devel<font>@</font>gmplib.org. (There are no email addresses of
				61	volunteers below, due to spamming problems.)
				62
				63	<ul>
				64	<li> <strong>Faster multiplication</strong>
				65
				66	<ol>
				67
				68	<li> Work on the algorithm selection code for unbalanced multiplication.
				69
				70	<li> Implement an FFT variant computing the coefficients mod m different
				71	limb size primes of the form l*2^k+1. i.e., compute m separate FFTs.
				72	The wanted coefficients will at the end be found by lifting with CRT
				73	(Chinese Remainder Theorem). If we let m = 3, i.e., use 3 primes, we
				74	can split the operands into coefficients at limb boundaries, and if
				75	our machine uses b-bit limbs, we can multiply numbers with close to
				76	2^b limbs without coefficient overflow. For smaller multiplication,
				77	we might perhaps let m = 1, and instead of splitting our operands at
				78	limb boundaries, split them in much smaller pieces. We might also use
				79	4 or more primes, and split operands into bigger than b-bit chunks.
				80	By using more primes, the gain in shorter transform length, but lose
				81	in having to do more FFTs, but that is a slight total save. We then
				82	lose in more expensive CRT. <br><br>
				83
				84	<p> [We now have two implementations of this algorithm, one by Tommy
				85	Färnqvist and one by Niels Möller.]
				86
				87	<li> Work on short products. Our mullo and mulmid are probably K, but we
				88	lack mulhi.
				89
				90	</ol>
				91
				92	<p> Another possibility would be an optimized cube. In the basecase that
				93	should definitely be able to save cross-products in a similar fashion to
				94	squaring, but some investigation might be needed for how best to adapt
				95	the higher-order algorithms. Not sure whether cubing or further small
				96	powers have any particularly important uses though.
				97
				98
				99	<li> <strong>Assembly routines</strong>
				100
				101	<p> Write new and improve existing assembly routines. The tests/devel
				102	programs and the tune/speed.c and tune/many.pl programs are useful for
				103	testing and timing the routines you write. See the README files in those
				104	directories for more information.
				105
				106	<p> Please make sure your new routines are fast for these three situations:
				107	<ol>
				108	<li> Small operands of less than, say, 10 limbs.
				109	<li> Medium size operands, that fit into the cache.
				110	<li> Huge operands that does not fit into the cache.
				111	</ol>
				112
				113	<p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
				114	mpn_sqr_basecase. The latter two don't exist for all machines, while
				115	mpn_addmul_1 exists for almost all machines.
				116
				117	<p> Standard techniques for these routines are unrolling, software
				118	pipelining, and specialization for common operand values. For machines
				119	with poor integer multiplication, it is sometimes possible to remedy the
				120	situation using floating-point operations or SIMD operations such as MMX
				121	(x86) (x86), SSE (x86), VMX (PowerPC), VIS (Sparc).
				122
				123	<p> Using floating-point operations is interesting but somewhat tricky.
				124	Since IEEE double has 53 bit of mantissa, one has to split the operands
				125	in small pieces, so that no intermediates are greater than 2^53. For
				126	32-bit computers, splitting one operand into 16-bit pieces works. For
				127	64-bit machines, one operand can be split into 21-bit pieces and the
				128	other into 32-bit pieces. (A 64-bit operand can be split into just three
				129	21-bit pieces if one allows the split operands to be negative!)
				130
				131
				132	<li> <strong>Faster sqrt</strong>
				133
				134	<p> The current code uses divisions, which are reasonably fast, but it'd be
				135	possible to use only multiplications by computing 1/sqrt(A) using this
				136	iteration:
				137	<pre>
				138	2
				139	x = x (3 − A x )/2
				140	i+1 i i </pre>
				141	The square root can then be computed like this:
				142	<pre>
				143	sqrt(A) = A x
				144	n </pre>
				145	<p> That final multiply might be the full size of the input (though it might
				146	only need the high half of that), so there may or may not be any speedup
				147	overall.
				148
				149	<p> We should probably allow a special exponent-like parameter, to speed
				150	computations of a precise square root of a small number in mpf and mpfr.
				151
				152
				153	<li> <strong>Nth root</strong>
				154
				155	<p> Improve mpn_rootrem. The current code is not too bad, but its time
				156	complexity is a function of the input, while it is possible to make
				157	the <i>average</i> complexity a function of the output.
				158
				159
				160	<li> <strong>Fat binaries</strong>
				161
				162	<p> Add more functions to the set of fat functions.
				163
				164	<p> The speed of multiplication is today highly dependent on combination
				165	functions like <code>addlsh1_n</code>. A fat binary will never use any such
				166	functions, since they are classified as optional. Ideally, we should use
				167	them, but making the current compile-time selections of optional functions
				168	become run-time selections for fat binaries.
				169
				170	<p> If we make fat binaries work really well, we should move away frm tehe
				171	current configure scheme (at least by default) and instead include all code
				172	always.
				173
				174
				175	<li> <strong>Exceptions</strong>
				176
				177	<p> Some sort of scheme for exceptions handling would be desirable.
				178	Presently the only thing documented is that divide by zero in GMP
				179	functions provokes a deliberate machine divide by zero (on those systems
				180	where such a thing exists at least). The global <code>gmp_errno</code>
				181	is not actually documented, except for the old <code>gmp_randinit</code>
				182	function. Being currently just a plain global means it's not
				183	thread-safe.
				184
				185	<p> The basic choices for exceptions are returning an error code or having a
				186	handler function to be called. The disadvantage of error returns is they
				187	have to be checked, leading to tedious and rarely executed code, and
				188	strictly speaking such a scheme wouldn't be source or binary compatible.
				189	The disadvantage of a handler function is that a <code>longjmp</code> or
				190	similar recovery from it may be difficult. A combination would be
				191	possible, for instance by allowing the handler to return an error code.
				192
				193	<p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
				194	normally be detected at the start of functions, so exception handling
				195	would have a clean state. What's worth considering though is that the
				196	GMP function detecting the exception may have been called via some third
				197	party library or self contained application module, and hence have
				198	various bits of state to be cleaned up above it. It'd be highly
				199	desirable for an exceptions scheme to allow for such cleanups.
				200
				201	<p> The C++ destructor mechanism could help with cleanups both internally and
				202	externally, but being a plain C library we don't want to depend on that.
				203
				204	<p> A C++ <code>throw</code> might be a good optional extra exceptions
				205	mechanism, perhaps under a build option. For
				206	GCC <code>-fexceptions</code> will add the necessary frame information to
				207	plain C code, or GMP could be compiled as C++.
				208
				209	<p> Out-of-memory exceptions are expected to be handled by the
				210	<code>mp_set_memory_functions</code> routines, rather than being a
				211	prospective part of divide-by-zero etc. Some similar considerations
				212	apply but what differs is that out-of-memory can arise deep within GMP
				213	internals. Even fundamental routines like <code>mpn_add_n</code> and
				214	<code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
				215	vector systems). Allowing for an error code return would require an
				216	awful lot of checking internally. Perhaps it'd still be worthwhile, but
				217	it'd be a lot of changes and the extra code would probably be rather
				218	rarely executed in normal usages.
				219
				220	<p> A <code>longjmp</code> recovery for out-of-memory will currently, in
				221	general, lead to memory leaks and may leave GMP variables operated on in
				222	inconsistent states. Maybe it'd be possible to record recovery
				223	information for use by the relevant allocate or reallocate function, but
				224	that too would be a lot of changes.
				225
				226	<p> One scheme for out-of-memory would be to note that all GMP allocations go
				227	through the <code>mp_set_memory_functions</code> routines. So if the
				228	application has an intended <code>setjmp</code> recovery point it can
				229	record memory activity by GMP and abandon space allocated and variables
				230	initialized after that point. This might be as simple as directing the
				231	allocation functions to a separate pool, but in general would have the
				232	disadvantage of needing application-level bookkeeping on top of the
				233	normal system <code>malloc</code>. An advantage however is that it needs
				234	nothing from GMP itself and on that basis doesn't burden applications not
				235	needing recovery. Note that there's probably some details to be worked
				236	out here about reallocs of existing variables, and perhaps about copying
				237	or swapping between "permanent" and "temporary" variables.
				238
				239	<p> Applications desiring a fine-grained error control, for instance a
				240	language interpreter, would very possibly not be well served by a scheme
				241	requiring <code>longjmp</code>. Wrapping every GMP function call with a
				242	<code>setjmp</code> would be very inconvenient.
				243
				244	<p> Another option would be to let <code>mpz_t</code> etc hold a sort of NaN,
				245	a special value indicating an out-of-memory or other failure. This would
				246	be similar to NaNs in mpfr. Unfortunately such a scheme could only be
				247	used by programs prepared to handle such special values, since for
				248	instance a program waiting for some condition to be satisfied could
				249	become an infinite loop if it wasn't also watching for NaNs. The work to
				250	implement this would be significant too, lots of checking of inputs and
				251	intermediate results. And if <code>mpn</code> routines were to
				252	participate in this (which they would have to internally) a lot of new
				253	return values would need to be added, since of course there's no
				254	<code>mpz_t</code> etc structure for them to indicate failure in.
				255
				256	<p> Stack overflow is another possible exception, but perhaps not one that
				257	can be easily detected in general. On i386 GNU/Linux for instance GCC
				258	normally doesn't generate stack probes for an <code>alloca</code>, but
				259	merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can
				260	miss the stack redzone and hit arbitrary data. GMP stack usage is
				261	normally a function of operand size, which might be enough for some
				262	applications to know they'll be safe. Otherwise a fixed maximum usage
				263	can probably be obtained by building with
				264	<code>--enable-alloca=malloc-reentrant</code> (or
				265	<code>notreentrant</code>). Arranging the default to be
				266	<code>alloca</code> only on blocks up to a certain size and
				267	<code>malloc</code> thereafter might be a better approach and would have
				268	the advantage of not having calculations limited by available stack.
				269
				270	<p> Actually recovering from stack overflow is of course another problem. It
				271	might be possible to catch a <code>SIGSEGV</code> in the stack redzone
				272	and do something in a <code>sigaltstack</code>, on systems which have
				273	that, but recovery might otherwise not be possible. This is worth
				274	bearing in mind because there's no point worrying about tight and careful
				275	out-of-memory recovery if an out-of-stack is fatal.
				276
				277	<p> Operand overflow is another exception to be addressed. It's easy for
				278	instance to ask <code>mpz_pow_ui</code> for a result bigger than an
				279	<code>mpz_t</code> can possibly represent. Currently overflows in limb
				280	or byte count calculations will go undetected. Often they'll still end
				281	up asking the memory functions for blocks bigger than available memory,
				282	but that's by no means certain and results are unpredictable in general.
				283	It'd be desirable to tighten up such size calculations. Probably only
				284	selected routines would need checks, if it's assumed say that no input
				285	will be more than half of all memory and hence size additions like say
				286	<code>mpz_mul</code> won't overflow.
				287
				288
				289	<li> <strong>Performance Tool</strong>
				290
				291	<p> It'd be nice to have some sort of tool for getting an overview of
				292	performance. Clearly a great many things could be done, but some primary
				293	uses would be,
				294
				295	<ol>
				296	<li> Checking speed variations between compilers.
				297	<li> Checking relative performance between systems or CPUs.
				298	</ol>
				299
				300	<p> A combination of measuring some fundamental routines and some
				301	representative application routines might satisfy these.
				302
				303	<p> The tune/time.c routines would be the easiest way to get good accurate
				304	measurements on lots of different systems. The high level
				305	<code>speed_measure</code> may or may not suit, but the basic
				306	<code>speed_starttime</code> and <code>speed_endtime</code> would cover
				307	lots of portability and accuracy questions.
				308
				309
				310	<li> <strong>Using <code>restrict</code></strong>
				311
				312	<p> There might be some value in judicious use of C99 style
				313	<code>restrict</code> on various pointers, but this would need some
				314	careful thought about what it implies for the various operand overlaps
				315	permitted in GMP.
				316
				317	<p> Rumour has it some pre-C99 compilers had <code>restrict</code>, but
				318	expressing tighter (or perhaps looser) requirements. Might be worth
				319	investigating that before using <code>restrict</code> unconditionally.
				320
				321	<p> Loops are presumably where the greatest benefit would be had, by allowing
				322	the compiler to advance reads ahead of writes, perhaps as part of loop
				323	unrolling. However critical loops are generally coded in assembler, so
				324	there might not be very much to gain. And on Cray systems the explicit
				325	use of <code>_Pragma</code> gives an equivalent effect.
				326
				327	<p> One thing to note is that Microsoft C headers (on ia64 at least) contain
				328	<code>__declspec(restrict)</code>, so a <code>#define</code> of
				329	<code>restrict</code> should be avoided. It might be wisest to setup a
				330	<code>gmp_restrict</code>.
				331
				332
				333	<li> <strong>Prime Testing</strong>
				334
				335	<p> GMP is not really a number theory library and probably shouldn't have
				336	large amounts of code dedicated to sophisticated prime testing
				337	algorithms, but basic things well-implemented would suit. Tests offering
				338	certainty are probably all too big or too slow (or both!) to justify
				339	inclusion in the main library. Demo programs showing some possibilities
				340	would be good though.
				341
				342	<p> The present "repetitions" argument to <code>mpz_probab_prime_p</code> is
				343	rather specific to the Miller-Rabin tests of the current implementation.
				344	Better would be some sort of parameter asking perhaps for a maximum
				345	chance 1/2^x of a probable prime in fact being composite. If
				346	applications follow the advice that the present reps gives 1/4^reps
				347	chance then perhaps such a change is unnecessary, but an explicitly
				348	described 1/2^x would allow for changes in the implementation or even for
				349	new proofs about the theory.
				350
				351	<p> <code>mpz_probab_prime_p</code> always initializes a new
				352	<code>gmp_randstate_t</code> for randomized tests, which unfortunately
				353	means it's not really very random and in particular always runs the same
				354	tests for a given input. Perhaps a new interface could accept an rstate
				355	to use, so successive tests could increase confidence in the result.
				356
				357	<p> <code>mpn_mod_34lsub1</code> is an obvious and easy improvement to the
				358	trial divisions. And since the various prime factors are constants, the
				359	remainder can be tested with something like
				360	<pre>
				361	#define MP_LIMB_DIVISIBLE_7_P(n) \
				362	((n) * MODLIMB_INVERSE_7 <= MP_LIMB_T_MAX/7)
				363	</pre>
				364	Which would help compilers that don't know how to optimize divisions by
				365	constants, and is even an improvement on current gcc 3.2 code. This
				366	technique works for any modulus, see Granlund and Montgomery "Division by
				367	Invariant Integers" section 9.
				368
				369	<p> The trial divisions are done with primes generated and grouped at
				370	runtime. This could instead be a table of data, with pre-calculated
				371	inverses too. Storing deltas, ie. amounts to add, rather than actual
				372	primes would save space. <code>udiv_qrnnd_preinv</code> style inverses
				373	can be made to exist by adding dummy factors of 2 if necessary. Some
				374	thought needs to be given as to how big such a table should be, based on
				375	how much dividing would be profitable for what sort of size inputs. The
				376	data could be shared by the perfect power testing.
				377
				378	<p> Jason Moxham points out that if a sqrt(-1) mod N exists then any factor
				379	of N must be == 1 mod 4, saving half the work in trial dividing. (If
				380	x^2==-1 mod N then for a prime factor p we have x^2==-1 mod p and so the
				381	jacobi symbol (-1/p)=1. But also (-1/p)=(-1)^((p-1)/2), hence must have
				382	p==1 mod 4.) But knowing whether sqrt(-1) mod N exists is not too easy.
				383	A strong pseudoprime test can reveal one, so perhaps such a test could be
				384	inserted part way though the dividing.
				385
				386	<p> Jon Grantham "Frobenius Pseudoprimes" (www.pseudoprime.com) describes a
				387	quadratic pseudoprime test taking about 3x longer than a plain test, but
				388	with only a 1/7710 chance of error (whereas 3 plain Miller-Rabin tests
				389	would offer only (1/4)^3 == 1/64). Such a test needs completely random
				390	parameters to satisfy the theory, though single-limb values would run
				391	faster. It's probably best to do at least one plain Miller-Rabin before
				392	any quadratic tests, since that can identify composites in less total
				393	time.
				394
				395	<p> Some thought needs to be given to the structure of which tests (trial
				396	division, Miller-Rabin, quadratic) and how many are done, based on what
				397	sort of inputs we expect, with a view to minimizing average time.
				398
				399	<p> It might be a good idea to break out subroutines for the various tests,
				400	so that an application can combine them in ways it prefers, if sensible
				401	defaults in <code>mpz_probab_prime_p</code> don't suit. In particular
				402	this would let applications skip tests it knew would be unprofitable,
				403	like trial dividing when an input is already known to have no small
				404	factors.
				405
				406	<p> For small inputs, combinations of theory and explicit search make it
				407	relatively easy to offer certainty. For instance numbers up to 2^32
				408	could be handled with a strong pseudoprime test and table lookup. But
				409	it's rather doubtful whether a smallnum prime test belongs in a bignum
				410	library. Perhaps if it had other internal uses.
				411
				412	<p> An <code>mpz_nthprime</code> might be cute, but is almost certainly
				413	impractical for anything but small n.
				414
				415
				416	<li> <strong>Intra-Library Calls</strong>
				417
				418	<p> On various systems, calls within libgmp still go through the PLT, TOC or
				419	other mechanism, which makes the code bigger and slower than it needs to
				420	be.
				421
				422	<p> The theory would be to have all GMP intra-library calls resolved directly
				423	to the routines in the library. An application wouldn't be able to
				424	replace a routine, the way it can normally, but there seems no good
				425	reason to do that, in normal circumstances.
				426
				427	<p> The <code>visibility</code> attribute in recent gcc is good for this,
				428	because it lets gcc omit unnecessary GOT pointer setups or whatever if it
				429	finds all calls are local and there's no global data references.
				430	Documented entrypoints would be <code>protected</code>, and purely
				431	internal things not wanted by test programs or anything can be
				432	<code>internal</code>.
				433
				434	<p> Unfortunately, on i386 it seems <code>protected</code> ends up causing
				435	text segment relocations within libgmp.so, meaning the library code can't
				436	be shared between processes, defeating the purpose of a shared library.
				437	Perhaps this is just a gremlin in binutils (debian packaged
				438	2.13.90.0.16-1).
				439
				440	<p> The linker can be told directly (with a link script, or options) to do
				441	the same sort of thing. This doesn't change the code emitted by gcc of
				442	course, but it does mean calls are resolved directly to their targets,
				443	avoiding a PLT entry.
				444
				445	<p> Keeping symbols private to libgmp.so is probably a good thing in general
				446	too, to stop anyone even attempting to access them. But some
				447	undocumented things will need or want to be kept visible, for use by
				448	mpfr, or the test and tune programs. Libtool has a standard option for
				449	selecting public symbols (used now for libmp).
				450
				451
				452	<li> <strong>Math functions for the mpf layer</strong>
				453
				454	<p> Implement the functions of math.h for the GMP mpf layer! Check the book
				455	"Pi and the AGM" by Borwein and Borwein for ideas how to do this. These
				456	functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2,
				457	cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
				458
				459	<p> Note that the <a href="http://mpfr.org">mpfr</a> functions already
				460	provide these functions, and that we usually recommend new programs to use
				461	mpfr instead of mpf.
				462	</ul>
				463	<hr>
				464
				465	</body>
				466	</html>
				467
				468	<!--
				469	Local variables:
				470	eval: (add-hook 'write-file-hooks 'time-stamp)
				471	time-stamp-start: "This file current as of "
				472	time-stamp-format: "%:d %3b %:y"
				473	time-stamp-end: "\\."
				474	time-stamp-line-limit: 50
				475	End:
				476	-->