blob: a69d0e9b0ec21b251096d27e153335602debee98 [file] [log] [blame]
Austin Schuhdace2a62020-08-18 10:56:48 -07001\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename gmp.info
4@documentencoding ISO-8859-1
5@include version.texi
6@settitle GNU MP @value{VERSION}
7@synindex tp fn
8@iftex
9@afourpaper
10@end iftex
11@comment %**end of header
12
13@copying
14This manual describes how to install and use the GNU multiple precision
15arithmetic library, version @value{VERSION}.
16
17Copyright 1991, 1993-2016, 2018 Free Software Foundation, Inc.
18
19Permission is granted to copy, distribute and/or modify this document under
20the terms of the GNU Free Documentation License, Version 1.3 or any later
21version published by the Free Software Foundation; with no Invariant Sections,
22with the Front-Cover Texts being ``A GNU Manual'', and with the Back-Cover
23Texts being ``You have freedom to copy and modify this GNU Manual, like GNU
24software''. A copy of the license is included in
25@ref{GNU Free Documentation License}.
26@end copying
27@c Note the @ref above must be on one line, a line break in an @ref within
28@c @copying will bomb in recent texinfo.tex (eg. 2004-04-07.08 which comes
29@c with texinfo 4.7), with messages about missing @endcsname.
30
31
32@c Texinfo version 4.2 or up will be needed to process this file.
33@c
34@c The version number and edition number are taken from version.texi provided
35@c by automake (note that it's regenerated only if you configure with
36@c --enable-maintainer-mode).
37@c
38@c Notes discussing the present version number of GMP in relation to previous
39@c ones (for instance in the "Compatibility" section) must be updated at
40@c manually though.
41@c
42@c @cindex entries have been made for function categories and programming
43@c topics. The "mpn" section is not included in this, because a beginner
44@c looking for "GCD" or something is only going to be confused by pointers to
45@c low level routines.
46@c
47@c @cindex entries are present for processors and systems when there's
48@c particular notes concerning them, but not just for everything GMP
49@c supports.
50@c
51@c Index entries for files use @code rather than @file, @samp or @option,
52@c since the latter come out with quotes in TeX, which are nice in the text
53@c but don't look so good in index columns.
54@c
55@c Tex:
56@c
57@c A suitable texinfo.tex is supplied, a newer one should work equally well.
58@c
59@c HTML:
60@c
61@c Nothing special is done for links to external manuals, they just come out
62@c in the usual makeinfo style, eg. "../libc/Locales.html". If you have
63@c local copies of such manuals then this is a good thing, if not then you
64@c may want to search-and-replace to some online source.
65@c
66
67@dircategory GNU libraries
68@direntry
69* gmp: (gmp). GNU Multiple Precision Arithmetic Library.
70@end direntry
71
72@c html <meta name="description" content="...">
73@documentdescription
74How to install and use the GNU multiple precision arithmetic library, version @value{VERSION}.
75@end documentdescription
76
77@c smallbook
78@finalout
79@setchapternewpage on
80
81@ifnottex
82@node Top, Copying, (dir), (dir)
83@top GNU MP
84@end ifnottex
85
86@iftex
87@titlepage
88@title GNU MP
89@subtitle The GNU Multiple Precision Arithmetic Library
90@subtitle Edition @value{EDITION}
91@subtitle @value{UPDATED}
92
93@author by Torbj@"orn Granlund and the GMP development team
94@c @email{tg@@gmplib.org}
95
96@c Include the Distribution inside the titlepage so
97@c that headings are turned off.
98
99@tex
100\global\parindent=0pt
101\global\parskip=8pt
102\global\baselineskip=13pt
103@end tex
104
105@page
106@vskip 0pt plus 1filll
107@end iftex
108
109@insertcopying
110@ifnottex
111@sp 1
112@end ifnottex
113
114@iftex
115@end titlepage
116@headings double
117@end iftex
118
119@c Don't bother with contents for html, the menus seem adequate.
120@ifnothtml
121@contents
122@end ifnothtml
123
124@menu
125* Copying:: GMP Copying Conditions (LGPL).
126* Introduction to GMP:: Brief introduction to GNU MP.
127* Installing GMP:: How to configure and compile the GMP library.
128* GMP Basics:: What every GMP user should know.
129* Reporting Bugs:: How to usefully report bugs.
130* Integer Functions:: Functions for arithmetic on signed integers.
131* Rational Number Functions:: Functions for arithmetic on rational numbers.
132* Floating-point Functions:: Functions for arithmetic on floats.
133* Low-level Functions:: Fast functions for natural numbers.
134* Random Number Functions:: Functions for generating random numbers.
135* Formatted Output:: @code{printf} style output.
136* Formatted Input:: @code{scanf} style input.
137* C++ Class Interface:: Class wrappers around GMP types.
138* Custom Allocation:: How to customize the internal allocation.
139* Language Bindings:: Using GMP from other languages.
140* Algorithms:: What happens behind the scenes.
141* Internals:: How values are represented behind the scenes.
142
143* Contributors:: Who brings you this library?
144* References:: Some useful papers and books to read.
145* GNU Free Documentation License::
146* Concept Index::
147* Function Index::
148@end menu
149
150
151@c @m{T,N} is $T$ in tex or @math{N} otherwise. Commas in N or T don't work,
152@c but @C{} can be used instead.
153@iftex
154@macro m {T,N}
155@tex$\T\$@end tex
156@end macro
157@end iftex
158@ifnottex
159@macro m {T,N}
160@math{\N\}
161@end macro
162@end ifnottex
163
164@c @mm{T,N} is $T$ tex and html and @math{N} in info. Commas in N or T don't
165@c work, but @C{} can be used instead.
166@iftex
167@macro mm {T,N}
168@tex$\T\$@end tex
169@end macro
170@end iftex
171
172@ifhtml
173@macro mm {T,N}
174@math{\T\}
175@end macro
176@end ifhtml
177
178@ifinfo
179@macro mm {T,N}
180@math{\N\}
181@end macro
182@end ifinfo
183
184
185@macro C {}
186,
187@end macro
188
189@c @ms{V,N} is $V_N$ in tex or just vn otherwise. This suits simple
190@c subscripts like @ms{x,0}.
191@iftex
192@macro ms {V,N}
193@tex$\V\_{\N\}$@end tex
194@end macro
195@end iftex
196@ifnottex
197@macro ms {V,N}
198\V\\N\
199@end macro
200@end ifnottex
201
202@c @nicode{S} is plain S in info, or @code{S} elsewhere. This can be used
203@c when the quotes that @code{} gives in info aren't wanted, but the
204@c fontification in tex or html is wanted. Doesn't work as @nicode{'\\0'}
205@c though (gives two backslashes in tex).
206@ifinfo
207@macro nicode {S}
208\S\
209@end macro
210@end ifinfo
211@ifnotinfo
212@macro nicode {S}
213@code{\S\}
214@end macro
215@end ifnotinfo
216
217@c @nisamp{S} is plain S in info, or @samp{S} elsewhere. This can be used
218@c when the quotes that @samp{} gives in info aren't wanted, but the
219@c fontification in tex or html is wanted.
220@ifinfo
221@macro nisamp {S}
222\S\
223@end macro
224@end ifinfo
225@ifnotinfo
226@macro nisamp {S}
227@samp{\S\}
228@end macro
229@end ifnotinfo
230
231@c Usage: @GMPtimes{}
232@c Give either \times or the word "times".
233@tex
234\gdef\GMPtimes{\times}
235@end tex
236@ifnottex
237@macro GMPtimes
238times
239@end macro
240@end ifnottex
241
242@c Usage: @GMPmultiply{}
243@c Give * in info, or nothing in tex.
244@tex
245\gdef\GMPmultiply{}
246@end tex
247@ifnottex
248@macro GMPmultiply
249*
250@end macro
251@end ifnottex
252
253@c Usage: @GMPabs{x}
254@c Give either |x| in tex, or abs(x) in info or html.
255@tex
256\gdef\GMPabs#1{|#1|}
257@end tex
258@ifnottex
259@macro GMPabs {X}
260@abs{}(\X\)
261@end macro
262@end ifnottex
263
264@c Usage: @GMPfloor{x}
265@c Give either \lfloor x\rfloor in tex, or floor(x) in info or html.
266@tex
267\gdef\GMPfloor#1{\lfloor #1\rfloor}
268@end tex
269@ifnottex
270@macro GMPfloor {X}
271floor(\X\)
272@end macro
273@end ifnottex
274
275@c Usage: @GMPceil{x}
276@c Give either \lceil x\rceil in tex, or ceil(x) in info or html.
277@tex
278\gdef\GMPceil#1{\lceil #1 \rceil}
279@end tex
280@ifnottex
281@macro GMPceil {X}
282ceil(\X\)
283@end macro
284@end ifnottex
285
286@c Math operators already available in tex, made available in info too.
287@c For example @bmod{} can be used in both tex and info.
288@ifnottex
289@macro bmod
290mod
291@end macro
292@macro gcd
293gcd
294@end macro
295@macro ge
296>=
297@end macro
298@macro le
299<=
300@end macro
301@macro log
302log
303@end macro
304@macro min
305min
306@end macro
307@macro leftarrow
308<-
309@end macro
310@macro rightarrow
311->
312@end macro
313@end ifnottex
314
315@c New math operators.
316@c @abs{} can be used in both tex and info, or just \abs in tex.
317@tex
318\gdef\abs{\mathop{\rm abs}}
319@end tex
320@ifnottex
321@macro abs
322abs
323@end macro
324@end ifnottex
325
326@c @cross{} is a \times symbol in tex, or an "x" in info. In tex it works
327@c inside or outside $ $.
328@tex
329\gdef\cross{\ifmmode\times\else$\times$\fi}
330@end tex
331@ifnottex
332@macro cross
333x
334@end macro
335@end ifnottex
336
337@c @times{} made available as a "*" in info and html (already works in tex).
338@ifnottex
339@macro times
340*
341@end macro
342@end ifnottex
343
344@c Usage: @W{text}
345@c Like @w{} but working in math mode too.
346@tex
347\gdef\W#1{\ifmmode{#1}\else\w{#1}\fi}
348@end tex
349@ifnottex
350@macro W {S}
351@w{\S\}
352@end macro
353@end ifnottex
354
355@c Usage: \GMPdisplay{text}
356@c Put the given text in an @display style indent, but without turning off
357@c paragraph reflow etc.
358@tex
359\gdef\GMPdisplay#1{%
360\noindent
361\advance\leftskip by \lispnarrowing
362#1\par}
363@end tex
364
365@c Usage: \GMPhat
366@c A new \hat that will work in math mode, unlike the texinfo redefined
367@c version.
368@tex
369\gdef\GMPhat{\mathaccent"705E}
370@end tex
371
372@c Usage: \GMPraise{text}
373@c For use in a $ $ math expression as an alternative to "^". This is good
374@c for @code{} in an exponent, since there seems to be no superscript font
375@c for that.
376@tex
377\gdef\GMPraise#1{\mskip0.5\thinmuskip\hbox{\raise0.8ex\hbox{#1}}}
378@end tex
379
380@c Usage: @texlinebreak{}
381@c A line break as per @*, but only in tex.
382@iftex
383@macro texlinebreak
384@*
385@end macro
386@end iftex
387@ifnottex
388@macro texlinebreak
389@end macro
390@end ifnottex
391
392@c Usage: @maybepagebreak
393@c Allow tex to insert a page break, if it feels the urge.
394@c Normally blocks of @deftypefun/funx are kept together, which can lead to
395@c some poor page break positioning if it's a big block, like the sets of
396@c division functions etc.
397@tex
398\gdef\maybepagebreak{\penalty0}
399@end tex
400@ifnottex
401@macro maybepagebreak
402@end macro
403@end ifnottex
404
405@c Usage: @GMPreftop{info,title}
406@c Usage: @GMPpxreftop{info,title}
407@c
408@c Like @ref{} and @pxref{}, but designed for a reference to the top of a
409@c document, not a particular section. The TeX output for plain @ref insists
410@c on printing a particular section, GMPreftop gives just the title.
411@c
412@c The texinfo manual recommends putting a likely section name in references
413@c like this, eg. "Introduction", but it seems better to just give the title.
414@c
415@iftex
416@macro GMPreftop{info,title}
417@i{\title\}
418@end macro
419@macro GMPpxreftop{info,title}
420see @i{\title\}
421@end macro
422@end iftex
423@c
424@ifnottex
425@macro GMPreftop{info,title}
426@ref{Top,\title\,\title\,\info\,\title\}
427@end macro
428@macro GMPpxreftop{info,title}
429@pxref{Top,\title\,\title\,\info\,\title\}
430@end macro
431@end ifnottex
432
433
434@node Copying, Introduction to GMP, Top, Top
435@comment node-name, next, previous, up
436@unnumbered GNU MP Copying Conditions
437@cindex Copying conditions
438@cindex Conditions for copying GNU MP
439@cindex License conditions
440
441This library is @dfn{free}; this means that everyone is free to use it and
442free to redistribute it on a free basis. The library is not in the public
443domain; it is copyrighted and there are restrictions on its distribution, but
444these restrictions are designed to permit everything that a good cooperating
445citizen would want to do. What is not allowed is to try to prevent others
446from further sharing any version of this library that they might get from
447you.@refill
448
449Specifically, we want to make sure that you have the right to give away copies
450of the library, that you receive source code or else can get it if you want
451it, that you can change this library or use pieces of it in new free programs,
452and that you know you can do these things.@refill
453
454To make sure that everyone has such rights, we have to forbid you to deprive
455anyone else of these rights. For example, if you distribute copies of the GNU
456MP library, you must give the recipients all the rights that you have. You
457must make sure that they, too, receive or can get the source code. And you
458must tell them their rights.@refill
459
460Also, for our own protection, we must make certain that everyone finds out
461that there is no warranty for the GNU MP library. If it is modified by
462someone else and passed on, we want their recipients to know that what they
463have is not what we distributed, so that any problems introduced by others
464will not reflect on our reputation.@refill
465
466More precisely, the GNU MP library is dual licensed, under the conditions of
467the GNU Lesser General Public License version 3 (see
468@file{COPYING.LESSERv3}), or the GNU General Public License version 2 (see
469@file{COPYINGv2}). This is the recipient's choice, and the recipient also has
470the additional option of applying later versions of these licenses. (The
471reason for this dual licensing is to make it possible to use the library with
472programs which are licensed under GPL version 2, but which for historical or
473other reasons do not allow use under later versions of the GPL).
474
475Programs which are not part of the library itself, such as demonstration
476programs and the GMP testsuite, are licensed under the terms of the GNU
477General Public License version 3 (see @file{COPYINGv3}), or any later
478version.
479
480
481@node Introduction to GMP, Installing GMP, Copying, Top
482@comment node-name, next, previous, up
483@chapter Introduction to GNU MP
484@cindex Introduction
485
486GNU MP is a portable library written in C for arbitrary precision arithmetic
487on integers, rational numbers, and floating-point numbers. It aims to provide
488the fastest possible arithmetic for all applications that need higher
489precision than is directly supported by the basic C types.
490
491Many applications use just a few hundred bits of precision; but some
492applications may need thousands or even millions of bits. GMP is designed to
493give good performance for both, by choosing algorithms based on the sizes of
494the operands, and by carefully keeping the overhead at a minimum.
495
496The speed of GMP is achieved by using fullwords as the basic arithmetic type,
497by using sophisticated algorithms, by including carefully optimized assembly
498code for the most common inner loops for many different CPUs, and by a general
499emphasis on speed (as opposed to simplicity or elegance).
500
501There is assembly code for these CPUs:
502@cindex CPU types
503ARM Cortex-A9, Cortex-A15, and generic ARM,
504DEC Alpha 21064, 21164, and 21264,
505AMD K8 and K10 (sold under many brands, e.g. Athlon64, Phenom, Opteron)
506Bulldozer, and Bobcat,
507Intel Pentium, Pentium Pro/II/III, Pentium 4, Core2, Nehalem, Sandy bridge, Haswell, generic x86,
508Intel IA-64,
509Motorola/IBM PowerPC 32 and 64 such as POWER970, POWER5, POWER6, and POWER7,
510MIPS 32-bit and 64-bit,
511SPARC 32-bit ad 64-bit with special support for all UltraSPARC models.
512There is also assembly code for many obsolete CPUs.
513
514
515@cindex Home page
516@cindex Web page
517@noindent
518For up-to-date information on GMP, please see the GMP web pages at
519
520@display
521@uref{https://gmplib.org/}
522@end display
523
524@cindex Latest version of GMP
525@cindex Anonymous FTP of latest version
526@cindex FTP of latest version
527@noindent
528The latest version of the library is available at
529
530@display
531@uref{https://ftp.gnu.org/gnu/gmp/}
532@end display
533
534Many sites around the world mirror @samp{ftp.gnu.org}, please use a mirror
535near you, see @uref{https://www.gnu.org/order/ftp.html} for a full list.
536
537@cindex Mailing lists
538There are three public mailing lists of interest. One for release
539announcements, one for general questions and discussions about usage of the GMP
540library and one for bug reports. For more information, see
541
542@display
543@uref{https://gmplib.org/mailman/listinfo/}.
544@end display
545
546The proper place for bug reports is @email{gmp-bugs@@gmplib.org}. See
547@ref{Reporting Bugs} for information about reporting bugs.
548
549@sp 1
550@section How to use this Manual
551@cindex About this manual
552
553Everyone should read @ref{GMP Basics}. If you need to install the library
554yourself, then read @ref{Installing GMP}. If you have a system with multiple
555ABIs, then read @ref{ABI and ISA}, for the compiler options that must be used
556on applications.
557
558The rest of the manual can be used for later reference, although it is
559probably a good idea to glance through it.
560
561
562@node Installing GMP, GMP Basics, Introduction to GMP, Top
563@comment node-name, next, previous, up
564@chapter Installing GMP
565@cindex Installing GMP
566@cindex Configuring GMP
567@cindex Building GMP
568
569GMP has an autoconf/automake/libtool based configuration system. On a
570Unix-like system a basic build can be done with
571
572@example
573./configure
574make
575@end example
576
577@noindent
578Some self-tests can be run with
579
580@example
581make check
582@end example
583
584@noindent
585And you can install (under @file{/usr/local} by default) with
586
587@example
588make install
589@end example
590
591If you experience problems, please report them to @email{gmp-bugs@@gmplib.org}.
592See @ref{Reporting Bugs}, for information on what to include in useful bug
593reports.
594
595@menu
596* Build Options::
597* ABI and ISA::
598* Notes for Package Builds::
599* Notes for Particular Systems::
600* Known Build Problems::
601* Performance optimization::
602@end menu
603
604
605@node Build Options, ABI and ISA, Installing GMP, Installing GMP
606@section Build Options
607@cindex Build options
608
609All the usual autoconf configure options are available, run @samp{./configure
610--help} for a summary. The file @file{INSTALL.autoconf} has some generic
611installation information too.
612
613@table @asis
614@item Tools
615@cindex Non-Unix systems
616@samp{configure} requires various Unix-like tools. See @ref{Notes for
617Particular Systems}, for some options on non-Unix systems.
618
619It might be possible to build without the help of @samp{configure}, certainly
620all the code is there, but unfortunately you'll be on your own.
621
622@item Build Directory
623@cindex Build directory
624To compile in a separate build directory, @command{cd} to that directory, and
625prefix the configure command with the path to the GMP source directory. For
626example
627
628@example
629cd /my/build/dir
630/my/sources/gmp-@value{VERSION}/configure
631@end example
632
633Not all @samp{make} programs have the necessary features (@code{VPATH}) to
634support this. In particular, SunOS and Slowaris @command{make} have bugs that
635make them unable to build in a separate directory. Use GNU @command{make}
636instead.
637
638@item @option{--prefix} and @option{--exec-prefix}
639@cindex Prefix
640@cindex Exec prefix
641@cindex Install prefix
642@cindex @code{--prefix}
643@cindex @code{--exec-prefix}
644The @option{--prefix} option can be used in the normal way to direct GMP to
645install under a particular tree. The default is @samp{/usr/local}.
646
647@option{--exec-prefix} can be used to direct architecture-dependent files like
648@file{libgmp.a} to a different location. This can be used to share
649architecture-independent parts like the documentation, but separate the
650dependent parts. Note however that @file{gmp.h} is
651architecture-dependent since it encodes certain aspects of @file{libgmp}, so
652it will be necessary to ensure both @file{$prefix/include} and
653@file{$exec_prefix/include} are available to the compiler.
654
655@item @option{--disable-shared}, @option{--disable-static}
656@cindex @code{--disable-shared}
657@cindex @code{--disable-static}
658By default both shared and static libraries are built (where possible), but
659one or other can be disabled. Shared libraries result in smaller executables
660and permit code sharing between separate running processes, but on some CPUs
661are slightly slower, having a small cost on each function call.
662
663@item Native Compilation, @option{--build=CPU-VENDOR-OS}
664@cindex Native compilation
665@cindex Build system
666@cindex @code{--build}
667For normal native compilation, the system can be specified with
668@samp{--build}. By default @samp{./configure} uses the output from running
669@samp{./config.guess}. On some systems @samp{./config.guess} can determine
670the exact CPU type, on others it will be necessary to give it explicitly. For
671example,
672
673@example
674./configure --build=ultrasparc-sun-solaris2.7
675@end example
676
677In all cases the @samp{OS} part is important, since it controls how libtool
678generates shared libraries. Running @samp{./config.guess} is the simplest way
679to see what it should be, if you don't know already.
680
681@item Cross Compilation, @option{--host=CPU-VENDOR-OS}
682@cindex Cross compiling
683@cindex Host system
684@cindex @code{--host}
685When cross-compiling, the system used for compiling is given by @samp{--build}
686and the system where the library will run is given by @samp{--host}. For
687example when using a FreeBSD Athlon system to build GNU/Linux m68k binaries,
688
689@example
690./configure --build=athlon-pc-freebsd3.5 --host=m68k-mac-linux-gnu
691@end example
692
693Compiler tools are sought first with the host system type as a prefix. For
694example @command{m68k-mac-linux-gnu-ranlib} is tried, then plain
695@command{ranlib}. This makes it possible for a set of cross-compiling tools
696to co-exist with native tools. The prefix is the argument to @samp{--host},
697and this can be an alias, such as @samp{m68k-linux}. But note that tools
698don't have to be setup this way, it's enough to just have a @env{PATH} with a
699suitable cross-compiling @command{cc} etc.
700
701Compiling for a different CPU in the same family as the build system is a form
702of cross-compilation, though very possibly this would merely be special
703options on a native compiler. In any case @samp{./configure} avoids depending
704on being able to run code on the build system, which is important when
705creating binaries for a newer CPU since they very possibly won't run on the
706build system.
707
708In all cases the compiler must be able to produce an executable (of whatever
709format) from a standard C @code{main}. Although only object files will go to
710make up @file{libgmp}, @samp{./configure} uses linking tests for various
711purposes, such as determining what functions are available on the host system.
712
713Currently a warning is given unless an explicit @samp{--build} is used when
714cross-compiling, because it may not be possible to correctly guess the build
715system type if the @env{PATH} has only a cross-compiling @command{cc}.
716
717Note that the @samp{--target} option is not appropriate for GMP@. It's for use
718when building compiler tools, with @samp{--host} being where they will run,
719and @samp{--target} what they'll produce code for. Ordinary programs or
720libraries like GMP are only interested in the @samp{--host} part, being where
721they'll run. (Some past versions of GMP used @samp{--target} incorrectly.)
722
723@item CPU types
724@cindex CPU types
725In general, if you want a library that runs as fast as possible, you should
726configure GMP for the exact CPU type your system uses. However, this may mean
727the binaries won't run on older members of the family, and might run slower on
728other members, older or newer. The best idea is always to build GMP for the
729exact machine type you intend to run it on.
730
731The following CPUs have specific support. See @file{configure.ac} for details
732of what code and compiler options they select.
733
734@itemize @bullet
735
736@c Keep this formatting, it's easy to read and it can be grepped to
737@c automatically test that CPUs listed get through ./config.sub
738
739@item
740Alpha:
741@nisamp{alpha},
742@nisamp{alphaev5},
743@nisamp{alphaev56},
744@nisamp{alphapca56},
745@nisamp{alphapca57},
746@nisamp{alphaev6},
747@nisamp{alphaev67},
748@nisamp{alphaev68}
749@nisamp{alphaev7}
750
751@item
752Cray:
753@nisamp{c90},
754@nisamp{j90},
755@nisamp{t90},
756@nisamp{sv1}
757
758@item
759HPPA:
760@nisamp{hppa1.0},
761@nisamp{hppa1.1},
762@nisamp{hppa2.0},
763@nisamp{hppa2.0n},
764@nisamp{hppa2.0w},
765@nisamp{hppa64}
766
767@item
768IA-64:
769@nisamp{ia64},
770@nisamp{itanium},
771@nisamp{itanium2}
772
773@item
774MIPS:
775@nisamp{mips},
776@nisamp{mips3},
777@nisamp{mips64}
778
779@item
780Motorola:
781@nisamp{m68k},
782@nisamp{m68000},
783@nisamp{m68010},
784@nisamp{m68020},
785@nisamp{m68030},
786@nisamp{m68040},
787@nisamp{m68060},
788@nisamp{m68302},
789@nisamp{m68360},
790@nisamp{m88k},
791@nisamp{m88110}
792
793@item
794POWER:
795@nisamp{power},
796@nisamp{power1},
797@nisamp{power2},
798@nisamp{power2sc}
799
800@item
801PowerPC:
802@nisamp{powerpc},
803@nisamp{powerpc64},
804@nisamp{powerpc401},
805@nisamp{powerpc403},
806@nisamp{powerpc405},
807@nisamp{powerpc505},
808@nisamp{powerpc601},
809@nisamp{powerpc602},
810@nisamp{powerpc603},
811@nisamp{powerpc603e},
812@nisamp{powerpc604},
813@nisamp{powerpc604e},
814@nisamp{powerpc620},
815@nisamp{powerpc630},
816@nisamp{powerpc740},
817@nisamp{powerpc7400},
818@nisamp{powerpc7450},
819@nisamp{powerpc750},
820@nisamp{powerpc801},
821@nisamp{powerpc821},
822@nisamp{powerpc823},
823@nisamp{powerpc860},
824@nisamp{powerpc970}
825
826@item
827SPARC:
828@nisamp{sparc},
829@nisamp{sparcv8},
830@nisamp{microsparc},
831@nisamp{supersparc},
832@nisamp{sparcv9},
833@nisamp{ultrasparc},
834@nisamp{ultrasparc2},
835@nisamp{ultrasparc2i},
836@nisamp{ultrasparc3},
837@nisamp{sparc64}
838
839@item
840x86 family:
841@nisamp{i386},
842@nisamp{i486},
843@nisamp{i586},
844@nisamp{pentium},
845@nisamp{pentiummmx},
846@nisamp{pentiumpro},
847@nisamp{pentium2},
848@nisamp{pentium3},
849@nisamp{pentium4},
850@nisamp{k6},
851@nisamp{k62},
852@nisamp{k63},
853@nisamp{athlon},
854@nisamp{amd64},
855@nisamp{viac3},
856@nisamp{viac32}
857
858@item
859Other:
860@nisamp{arm},
861@nisamp{sh},
862@nisamp{sh2},
863@nisamp{vax},
864@end itemize
865
866CPUs not listed will use generic C code.
867
868@item Generic C Build
869@cindex Generic C
870If some of the assembly code causes problems, or if otherwise desired, the
871generic C code can be selected with the configure @option{--disable-assembly}.
872
873Note that this will run quite slowly, but it should be portable and should at
874least make it possible to get something running if all else fails.
875
876@item Fat binary, @option{--enable-fat}
877@cindex Fat binary
878@cindex @code{--enable-fat}
879Using @option{--enable-fat} selects a ``fat binary'' build on x86, where
880optimized low level subroutines are chosen at runtime according to the CPU
881detected. This means more code, but gives good performance on all x86 chips.
882(This option might become available for more architectures in the future.)
883
884@item @option{ABI}
885@cindex ABI
886On some systems GMP supports multiple ABIs (application binary interfaces),
887meaning data type sizes and calling conventions. By default GMP chooses the
888best ABI available, but a particular ABI can be selected. For example
889
890@example
891./configure --host=mips64-sgi-irix6 ABI=n32
892@end example
893
894See @ref{ABI and ISA}, for the available choices on relevant CPUs, and what
895applications need to do.
896
897@item @option{CC}, @option{CFLAGS}
898@cindex C compiler
899@cindex @code{CC}
900@cindex @code{CFLAGS}
901By default the C compiler used is chosen from among some likely candidates,
902with @command{gcc} normally preferred if it's present. The usual
903@samp{CC=whatever} can be passed to @samp{./configure} to choose something
904different.
905
906For various systems, default compiler flags are set based on the CPU and
907compiler. The usual @samp{CFLAGS="-whatever"} can be passed to
908@samp{./configure} to use something different or to set good flags for systems
909GMP doesn't otherwise know.
910
911The @samp{CC} and @samp{CFLAGS} used are printed during @samp{./configure},
912and can be found in each generated @file{Makefile}. This is the easiest way
913to check the defaults when considering changing or adding something.
914
915Note that when @samp{CC} and @samp{CFLAGS} are specified on a system
916supporting multiple ABIs it's important to give an explicit
917@samp{ABI=whatever}, since GMP can't determine the ABI just from the flags and
918won't be able to select the correct assembly code.
919
920If just @samp{CC} is selected then normal default @samp{CFLAGS} for that
921compiler will be used (if GMP recognises it). For example @samp{CC=gcc} can
922be used to force the use of GCC, with default flags (and default ABI).
923
924@item @option{CPPFLAGS}
925@cindex @code{CPPFLAGS}
926Any flags like @samp{-D} defines or @samp{-I} includes required by the
927preprocessor should be set in @samp{CPPFLAGS} rather than @samp{CFLAGS}.
928Compiling is done with both @samp{CPPFLAGS} and @samp{CFLAGS}, but
929preprocessing uses just @samp{CPPFLAGS}. This distinction is because most
930preprocessors won't accept all the flags the compiler does. Preprocessing is
931done separately in some configure tests.
932
933@item @option{CC_FOR_BUILD}
934@cindex @code{CC_FOR_BUILD}
935Some build-time programs are compiled and run to generate host-specific data
936tables. @samp{CC_FOR_BUILD} is the compiler used for this. It doesn't need
937to be in any particular ABI or mode, it merely needs to generate executables
938that can run. The default is to try the selected @samp{CC} and some likely
939candidates such as @samp{cc} and @samp{gcc}, looking for something that works.
940
941No flags are used with @samp{CC_FOR_BUILD} because a simple invocation like
942@samp{cc foo.c} should be enough. If some particular options are required
943they can be included as for instance @samp{CC_FOR_BUILD="cc -whatever"}.
944
945@item C++ Support, @option{--enable-cxx}
946@cindex C++ support
947@cindex @code{--enable-cxx}
948C++ support in GMP can be enabled with @samp{--enable-cxx}, in which case a
949C++ compiler will be required. As a convenience @samp{--enable-cxx=detect}
950can be used to enable C++ support only if a compiler can be found. The C++
951support consists of a library @file{libgmpxx.la} and header file
952@file{gmpxx.h} (@pxref{Headers and Libraries}).
953
954A separate @file{libgmpxx.la} has been adopted rather than having C++ objects
955within @file{libgmp.la} in order to ensure dynamic linked C programs aren't
956bloated by a dependency on the C++ standard library, and to avoid any chance
957that the C++ compiler could be required when linking plain C programs.
958
959@file{libgmpxx.la} will use certain internals from @file{libgmp.la} and can
960only be expected to work with @file{libgmp.la} from the same GMP version.
961Future changes to the relevant internals will be accompanied by renaming, so a
962mismatch will cause unresolved symbols rather than perhaps mysterious
963misbehaviour.
964
965In general @file{libgmpxx.la} will be usable only with the C++ compiler that
966built it, since name mangling and runtime support are usually incompatible
967between different compilers.
968
969@item @option{CXX}, @option{CXXFLAGS}
970@cindex C++ compiler
971@cindex @code{CXX}
972@cindex @code{CXXFLAGS}
973When C++ support is enabled, the C++ compiler and its flags can be set with
974variables @samp{CXX} and @samp{CXXFLAGS} in the usual way. The default for
975@samp{CXX} is the first compiler that works from a list of likely candidates,
976with @command{g++} normally preferred when available. The default for
977@samp{CXXFLAGS} is to try @samp{CFLAGS}, @samp{CFLAGS} without @samp{-g}, then
978for @command{g++} either @samp{-g -O2} or @samp{-O2}, or for other compilers
979@samp{-g} or nothing. Trying @samp{CFLAGS} this way is convenient when using
980@samp{gcc} and @samp{g++} together, since the flags for @samp{gcc} will
981usually suit @samp{g++}.
982
983It's important that the C and C++ compilers match, meaning their startup and
984runtime support routines are compatible and that they generate code in the
985same ABI (if there's a choice of ABIs on the system). @samp{./configure}
986isn't currently able to check these things very well itself, so for that
987reason @samp{--disable-cxx} is the default, to avoid a build failure due to a
988compiler mismatch. Perhaps this will change in the future.
989
990Incidentally, it's normally not good enough to set @samp{CXX} to the same as
991@samp{CC}. Although @command{gcc} for instance recognises @file{foo.cc} as
992C++ code, only @command{g++} will invoke the linker the right way when
993building an executable or shared library from C++ object files.
994
995@item Temporary Memory, @option{--enable-alloca=<choice>}
996@cindex Temporary memory
997@cindex Stack overflow
998@cindex @code{alloca}
999@cindex @code{--enable-alloca}
1000GMP allocates temporary workspace using one of the following three methods,
1001which can be selected with for instance
1002@samp{--enable-alloca=malloc-reentrant}.
1003
1004@itemize @bullet
1005@item
1006@samp{alloca} - C library or compiler builtin.
1007@item
1008@samp{malloc-reentrant} - the heap, in a re-entrant fashion.
1009@item
1010@samp{malloc-notreentrant} - the heap, with global variables.
1011@end itemize
1012
1013For convenience, the following choices are also available.
1014@samp{--disable-alloca} is the same as @samp{no}.
1015
1016@itemize @bullet
1017@item
1018@samp{yes} - a synonym for @samp{alloca}.
1019@item
1020@samp{no} - a synonym for @samp{malloc-reentrant}.
1021@item
1022@samp{reentrant} - @code{alloca} if available, otherwise
1023@samp{malloc-reentrant}. This is the default.
1024@item
1025@samp{notreentrant} - @code{alloca} if available, otherwise
1026@samp{malloc-notreentrant}.
1027@end itemize
1028
1029@code{alloca} is reentrant and fast, and is recommended. It actually allocates
1030just small blocks on the stack; larger ones use malloc-reentrant.
1031
1032@samp{malloc-reentrant} is, as the name suggests, reentrant and thread safe,
1033but @samp{malloc-notreentrant} is faster and should be used if reentrancy is
1034not required.
1035
1036The two malloc methods in fact use the memory allocation functions selected by
1037@code{mp_set_memory_functions}, these being @code{malloc} and friends by
1038default. @xref{Custom Allocation}.
1039
1040An additional choice @samp{--enable-alloca=debug} is available, to help when
1041debugging memory related problems (@pxref{Debugging}).
1042
1043@item FFT Multiplication, @option{--disable-fft}
1044@cindex FFT multiplication
1045@cindex @code{--disable-fft}
1046By default multiplications are done using Karatsuba, 3-way Toom, higher degree
1047Toom, and Fermat FFT@. The FFT is only used on large to very large operands
1048and can be disabled to save code size if desired.
1049
1050@item Assertion Checking, @option{--enable-assert}
1051@cindex Assertion checking
1052@cindex @code{--enable-assert}
1053This option enables some consistency checking within the library. This can be
1054of use while debugging, @pxref{Debugging}.
1055
1056@item Execution Profiling, @option{--enable-profiling=prof/gprof/instrument}
1057@cindex Execution profiling
1058@cindex @code{--enable-profiling}
1059Enable profiling support, in one of various styles, @pxref{Profiling}.
1060
1061@item @option{MPN_PATH}
1062@cindex @code{MPN_PATH}
1063Various assembly versions of each mpn subroutines are provided. For a given
1064CPU, a search is made though a path to choose a version of each. For example
1065@samp{sparcv8} has
1066
1067@example
1068MPN_PATH="sparc32/v8 sparc32 generic"
1069@end example
1070
1071which means look first for v8 code, then plain sparc32 (which is v7), and
1072finally fall back on generic C@. Knowledgeable users with special requirements
1073can specify a different path. Normally this is completely unnecessary.
1074
1075@item Documentation
1076@cindex Documentation formats
1077@cindex Texinfo
1078The source for the document you're now reading is @file{doc/gmp.texi}, in
1079Texinfo format, see @GMPreftop{texinfo, Texinfo}.
1080
1081@cindex Postscript
1082@cindex DVI
1083@cindex PDF
1084Info format @samp{doc/gmp.info} is included in the distribution. The usual
1085automake targets are available to make PostScript, DVI, PDF and HTML (these
1086will require various @TeX{} and Texinfo tools).
1087
1088@cindex DocBook
1089@cindex XML
1090DocBook and XML can be generated by the Texinfo @command{makeinfo} program
1091too, see @ref{makeinfo options,, Options for @command{makeinfo}, texinfo,
1092Texinfo}.
1093
1094Some supplementary notes can also be found in the @file{doc} subdirectory.
1095
1096@end table
1097
1098
1099@need 2000
1100@node ABI and ISA, Notes for Package Builds, Build Options, Installing GMP
1101@section ABI and ISA
1102@cindex ABI
1103@cindex Application Binary Interface
1104@cindex ISA
1105@cindex Instruction Set Architecture
1106
1107ABI (Application Binary Interface) refers to the calling conventions between
1108functions, meaning what registers are used and what sizes the various C data
1109types are. ISA (Instruction Set Architecture) refers to the instructions and
1110registers a CPU has available.
1111
1112Some 64-bit ISA CPUs have both a 64-bit ABI and a 32-bit ABI defined, the
1113latter for compatibility with older CPUs in the family. GMP supports some
1114CPUs like this in both ABIs. In fact within GMP @samp{ABI} means a
1115combination of chip ABI, plus how GMP chooses to use it. For example in some
111632-bit ABIs, GMP may support a limb as either a 32-bit @code{long} or a 64-bit
1117@code{long long}.
1118
1119By default GMP chooses the best ABI available for a given system, and this
1120generally gives significantly greater speed. But an ABI can be chosen
1121explicitly to make GMP compatible with other libraries, or particular
1122application requirements. For example,
1123
1124@example
1125./configure ABI=32
1126@end example
1127
1128In all cases it's vital that all object code used in a given program is
1129compiled for the same ABI.
1130
1131Usually a limb is implemented as a @code{long}. When a @code{long long} limb
1132is used this is encoded in the generated @file{gmp.h}. This is convenient for
1133applications, but it does mean that @file{gmp.h} will vary, and can't be just
1134copied around. @file{gmp.h} remains compiler independent though, since all
1135compilers for a particular ABI will be expected to use the same limb type.
1136
1137Currently no attempt is made to follow whatever conventions a system has for
1138installing library or header files built for a particular ABI@. This will
1139probably only matter when installing multiple builds of GMP, and it might be
1140as simple as configuring with a special @samp{libdir}, or it might require
1141more than that. Note that builds for different ABIs need to done separately,
1142with a fresh @command{./configure} and @command{make} each.
1143
1144@sp 1
1145@table @asis
1146@need 1000
1147@item AMD64 (@samp{x86_64})
1148@cindex AMD64
1149On AMD64 systems supporting both 32-bit and 64-bit modes for applications, the
1150following ABI choices are available.
1151
1152@table @asis
1153@item @samp{ABI=64}
1154The 64-bit ABI uses 64-bit limbs and pointers and makes full use of the chip
1155architecture. This is the default. Applications will usually not need
1156special compiler flags, but for reference the option is
1157
1158@example
1159gcc -m64
1160@end example
1161
1162@item @samp{ABI=32}
1163The 32-bit ABI is the usual i386 conventions. This will be slower, and is not
1164recommended except for inter-operating with other code not yet 64-bit capable.
1165Applications must be compiled with
1166
1167@example
1168gcc -m32
1169@end example
1170
1171(In GCC 2.95 and earlier there's no @samp{-m32} option, it's the only mode.)
1172
1173@item @samp{ABI=x32}
1174The x32 ABI uses 64-bit limbs but 32-bit pointers. Like the 64-bit ABI, it
1175makes full use of the chip's arithmetic capabilities. This ABI is not
1176supported by all operating systems.
1177
1178@example
1179gcc -mx32
1180@end example
1181
1182@end table
1183
1184@sp 1
1185@need 1000
1186@item HPPA 2.0 (@samp{hppa2.0*}, @samp{hppa64})
1187@cindex HPPA
1188@cindex HP-UX
1189@table @asis
1190@item @samp{ABI=2.0w}
1191The 2.0w ABI uses 64-bit limbs and pointers and is available on HP-UX 11 or
1192up. Applications must be compiled with
1193
1194@example
1195gcc [built for 2.0w]
1196cc +DD64
1197@end example
1198
1199@item @samp{ABI=2.0n}
1200The 2.0n ABI means the 32-bit HPPA 1.0 ABI and all its normal calling
1201conventions, but with 64-bit instructions permitted within functions. GMP
1202uses a 64-bit @code{long long} for a limb. This ABI is available on hppa64
1203GNU/Linux and on HP-UX 10 or higher. Applications must be compiled with
1204
1205@example
1206gcc [built for 2.0n]
1207cc +DA2.0 +e
1208@end example
1209
1210Note that current versions of GCC (eg.@: 3.2) don't generate 64-bit
1211instructions for @code{long long} operations and so may be slower than for
12122.0w. (The GMP assembly code is the same though.)
1213
1214@item @samp{ABI=1.0}
1215HPPA 2.0 CPUs can run all HPPA 1.0 and 1.1 code in the 32-bit HPPA 1.0 ABI@.
1216No special compiler options are needed for applications.
1217@end table
1218
1219All three ABIs are available for CPU types @samp{hppa2.0w}, @samp{hppa2.0} and
1220@samp{hppa64}, but for CPU type @samp{hppa2.0n} only 2.0n or 1.0 are
1221considered.
1222
1223Note that GCC on HP-UX has no options to choose between 2.0n and 2.0w modes,
1224unlike HP @command{cc}. Instead it must be built for one or the other ABI@.
1225GMP will detect how it was built, and skip to the corresponding @samp{ABI}.
1226
1227@sp 1
1228@need 1500
1229@item IA-64 under HP-UX (@samp{ia64*-*-hpux*}, @samp{itanium*-*-hpux*})
1230@cindex IA-64
1231@cindex HP-UX
1232HP-UX supports two ABIs for IA-64. GMP performance is the same in both.
1233
1234@table @asis
1235@item @samp{ABI=32}
1236In the 32-bit ABI, pointers, @code{int}s and @code{long}s are 32 bits and GMP
1237uses a 64 bit @code{long long} for a limb. Applications can be compiled
1238without any special flags since this ABI is the default in both HP C and GCC,
1239but for reference the flags are
1240
1241@example
1242gcc -milp32
1243cc +DD32
1244@end example
1245
1246@item @samp{ABI=64}
1247In the 64-bit ABI, @code{long}s and pointers are 64 bits and GMP uses a
1248@code{long} for a limb. Applications must be compiled with
1249
1250@example
1251gcc -mlp64
1252cc +DD64
1253@end example
1254@end table
1255
1256On other IA-64 systems, GNU/Linux for instance, @samp{ABI=64} is the only
1257choice.
1258
1259@sp 1
1260@need 1000
1261@item MIPS under IRIX 6 (@samp{mips*-*-irix[6789]})
1262@cindex MIPS
1263@cindex IRIX
1264IRIX 6 always has a 64-bit MIPS 3 or better CPU, and supports ABIs o32, n32,
1265and 64. n32 or 64 are recommended, and GMP performance will be the same in
1266each. The default is n32.
1267
1268@table @asis
1269@item @samp{ABI=o32}
1270The o32 ABI is 32-bit pointers and integers, and no 64-bit operations. GMP
1271will be slower than in n32 or 64, this option only exists to support old
1272compilers, eg.@: GCC 2.7.2. Applications can be compiled with no special
1273flags on an old compiler, or on a newer compiler with
1274
1275@example
1276gcc -mabi=32
1277cc -32
1278@end example
1279
1280@item @samp{ABI=n32}
1281The n32 ABI is 32-bit pointers and integers, but with a 64-bit limb using a
1282@code{long long}. Applications must be compiled with
1283
1284@example
1285gcc -mabi=n32
1286cc -n32
1287@end example
1288
1289@item @samp{ABI=64}
1290The 64-bit ABI is 64-bit pointers and integers. Applications must be compiled
1291with
1292
1293@example
1294gcc -mabi=64
1295cc -64
1296@end example
1297@end table
1298
1299Note that MIPS GNU/Linux, as of kernel version 2.2, doesn't have the necessary
1300support for n32 or 64 and so only gets a 32-bit limb and the MIPS 2 code.
1301
1302@sp 1
1303@need 1000
1304@item PowerPC 64 (@samp{powerpc64}, @samp{powerpc620}, @samp{powerpc630}, @samp{powerpc970}, @samp{power4}, @samp{power5})
1305@cindex PowerPC
1306@table @asis
1307@item @samp{ABI=mode64}
1308@cindex AIX
1309The AIX 64 ABI uses 64-bit limbs and pointers and is the default on PowerPC 64
1310@samp{*-*-aix*} systems. Applications must be compiled with
1311
1312@example
1313gcc -maix64
1314xlc -q64
1315@end example
1316
1317On 64-bit GNU/Linux, BSD, and Mac OS X/Darwin systems, the applications must
1318be compiled with
1319
1320@example
1321gcc -m64
1322@end example
1323
1324@item @samp{ABI=mode32}
1325The @samp{mode32} ABI uses a 64-bit @code{long long} limb but with the chip
1326still in 32-bit mode and using 32-bit calling conventions. This is the default
1327for systems where the true 64-bit ABI is unavailable. No special compiler
1328options are typically needed for applications. This ABI is not available under
1329AIX.
1330
1331@item @samp{ABI=32}
1332This is the basic 32-bit PowerPC ABI, with a 32-bit limb. No special compiler
1333options are needed for applications.
1334@end table
1335
1336GMP's speed is greatest for the @samp{mode64} ABI, the @samp{mode32} ABI is 2nd
1337best. In @samp{ABI=32} only the 32-bit ISA is used and this doesn't make full
1338use of a 64-bit chip.
1339
1340@sp 1
1341@need 1000
1342@item Sparc V9 (@samp{sparc64}, @samp{sparcv9}, @samp{ultrasparc*})
1343@cindex Sparc V9
1344@cindex Solaris
1345@cindex Sun
1346@table @asis
1347@item @samp{ABI=64}
1348The 64-bit V9 ABI is available on the various BSD sparc64 ports, recent
1349versions of Sparc64 GNU/Linux, and Solaris 2.7 and up (when the kernel is in
135064-bit mode). GCC 3.2 or higher, or Sun @command{cc} is required. On
1351GNU/Linux, depending on the default @command{gcc} mode, applications must be
1352compiled with
1353
1354@example
1355gcc -m64
1356@end example
1357
1358On Solaris applications must be compiled with
1359
1360@example
1361gcc -m64 -mptr64 -Wa,-xarch=v9 -mcpu=v9
1362cc -xarch=v9
1363@end example
1364
1365On the BSD sparc64 systems no special options are required, since 64-bits is
1366the only ABI available.
1367
1368@item @samp{ABI=32}
1369For the basic 32-bit ABI, GMP still uses as much of the V9 ISA as it can. In
1370the Sun documentation this combination is known as ``v8plus''. On GNU/Linux,
1371depending on the default @command{gcc} mode, applications may need to be
1372compiled with
1373
1374@example
1375gcc -m32
1376@end example
1377
1378On Solaris, no special compiler options are required for applications, though
1379using something like the following is recommended. (@command{gcc} 2.8 and
1380earlier only support @samp{-mv8} though.)
1381
1382@example
1383gcc -mv8plus
1384cc -xarch=v8plus
1385@end example
1386@end table
1387
1388GMP speed is greatest in @samp{ABI=64}, so it's the default where available.
1389The speed is partly because there are extra registers available and partly
1390because 64-bits is considered the more important case and has therefore had
1391better code written for it.
1392
1393Don't be confused by the names of the @samp{-m} and @samp{-x} compiler
1394options, they're called @samp{arch} but effectively control both ABI and ISA@.
1395
1396On Solaris 2.6 and earlier, only @samp{ABI=32} is available since the kernel
1397doesn't save all registers.
1398
1399On Solaris 2.7 with the kernel in 32-bit mode, a normal native build will
1400reject @samp{ABI=64} because the resulting executables won't run.
1401@samp{ABI=64} can still be built if desired by making it look like a
1402cross-compile, for example
1403
1404@example
1405./configure --build=none --host=sparcv9-sun-solaris2.7 ABI=64
1406@end example
1407@end table
1408
1409
1410@need 2000
1411@node Notes for Package Builds, Notes for Particular Systems, ABI and ISA, Installing GMP
1412@section Notes for Package Builds
1413@cindex Build notes for binary packaging
1414@cindex Packaged builds
1415
1416GMP should present no great difficulties for packaging in a binary
1417distribution.
1418
1419@cindex Libtool versioning
1420@cindex Shared library versioning
1421Libtool is used to build the library and @samp{-version-info} is set
1422appropriately, having started from @samp{3:0:0} in GMP 3.0 (@pxref{Versioning,
1423Library interface versions, Library interface versions, libtool, GNU
1424Libtool}).
1425
1426The GMP 4 series will be upwardly binary compatible in each release and will
1427be upwardly binary compatible with all of the GMP 3 series. Additional
1428function interfaces may be added in each release, so on systems where libtool
1429versioning is not fully checked by the loader an auxiliary mechanism may be
1430needed to express that a dynamic linked application depends on a new enough
1431GMP.
1432
1433An auxiliary mechanism may also be needed to express that @file{libgmpxx.la}
1434(from @option{--enable-cxx}, @pxref{Build Options}) requires @file{libgmp.la}
1435from the same GMP version, since this is not done by the libtool versioning,
1436nor otherwise. A mismatch will result in unresolved symbols from the linker,
1437or perhaps the loader.
1438
1439When building a package for a CPU family, care should be taken to use
1440@samp{--host} (or @samp{--build}) to choose the least common denominator among
1441the CPUs which might use the package. For example this might mean plain
1442@samp{sparc} (meaning V7) for SPARCs.
1443
1444For x86s, @option{--enable-fat} sets things up for a fat binary build, making a
1445runtime selection of optimized low level routines. This is a good choice for
1446packaging to run on a range of x86 chips.
1447
1448Users who care about speed will want GMP built for their exact CPU type, to
1449make best use of the available optimizations. Providing a way to suitably
1450rebuild a package may be useful. This could be as simple as making it
1451possible for a user to omit @samp{--build} (and @samp{--host}) so
1452@samp{./config.guess} will detect the CPU@. But a way to manually specify a
1453@samp{--build} will be wanted for systems where @samp{./config.guess} is
1454inexact.
1455
1456On systems with multiple ABIs, a packaged build will need to decide which
1457among the choices is to be provided, see @ref{ABI and ISA}. A given run of
1458@samp{./configure} etc will only build one ABI@. If a second ABI is also
1459required then a second run of @samp{./configure} etc must be made, starting
1460from a clean directory tree (@samp{make distclean}).
1461
1462As noted under ``ABI and ISA'', currently no attempt is made to follow system
1463conventions for install locations that vary with ABI, such as
1464@file{/usr/lib/sparcv9} for @samp{ABI=64} as opposed to @file{/usr/lib} for
1465@samp{ABI=32}. A package build can override @samp{libdir} and other standard
1466variables as necessary.
1467
1468Note that @file{gmp.h} is a generated file, and will be architecture and ABI
1469dependent. When attempting to install two ABIs simultaneously it will be
1470important that an application compile gets the correct @file{gmp.h} for its
1471desired ABI@. If compiler include paths don't vary with ABI options then it
1472might be necessary to create a @file{/usr/include/gmp.h} which tests
1473preprocessor symbols and chooses the correct actual @file{gmp.h}.
1474
1475
1476@need 2000
1477@node Notes for Particular Systems, Known Build Problems, Notes for Package Builds, Installing GMP
1478@section Notes for Particular Systems
1479@cindex Build notes for particular systems
1480@cindex Particular systems
1481@cindex Systems
1482@table @asis
1483
1484@c This section is more or less meant for notes about performance or about
1485@c build problems that have been worked around but might leave a user
1486@c scratching their head. Fun with different ABIs on a system belongs in the
1487@c above section.
1488
1489@item AIX 3 and 4
1490@cindex AIX
1491On systems @samp{*-*-aix[34]*} shared libraries are disabled by default, since
1492some versions of the native @command{ar} fail on the convenience libraries
1493used. A shared build can be attempted with
1494
1495@example
1496./configure --enable-shared --disable-static
1497@end example
1498
1499Note that the @samp{--disable-static} is necessary because in a shared build
1500libtool makes @file{libgmp.a} a symlink to @file{libgmp.so}, apparently for
1501the benefit of old versions of @command{ld} which only recognise @file{.a},
1502but unfortunately this is done even if a fully functional @command{ld} is
1503available.
1504
1505@item ARM
1506@cindex ARM
1507On systems @samp{arm*-*-*}, versions of GCC up to and including 2.95.3 have a
1508bug in unsigned division, giving wrong results for some operands. GMP
1509@samp{./configure} will demand GCC 2.95.4 or later.
1510
1511@item Compaq C++
1512@cindex Compaq C++
1513Compaq C++ on OSF 5.1 has two flavours of @code{iostream}, a standard one and
1514an old pre-standard one (see @samp{man iostream_intro}). GMP can only use the
1515standard one, which unfortunately is not the default but must be selected by
1516defining @code{__USE_STD_IOSTREAM}. Configure with for instance
1517
1518@example
1519./configure --enable-cxx CPPFLAGS=-D__USE_STD_IOSTREAM
1520@end example
1521
1522@item Floating Point Mode
1523@cindex Floating point mode
1524@cindex Hardware floating point mode
1525@cindex Precision of hardware floating point
1526@cindex x87
1527On some systems, the hardware floating point has a control mode which can set
1528all operations to be done in a particular precision, for instance single,
1529double or extended on x86 systems (x87 floating point). The GMP functions
1530involving a @code{double} cannot be expected to operate to their full
1531precision when the hardware is in single precision mode. Of course this
1532affects all code, including application code, not just GMP.
1533
1534@item FreeBSD 7.x, 8.x, 9.0, 9.1, 9.2
1535@cindex FreeBSD
1536@command{m4} in these releases of FreeBSD has an eval function which ignores
1537its 2nd and 3rd arguments, which makes it unsuitable for @file{.asm} file
1538processing. @samp{./configure} will detect the problem and either abort or
1539choose another m4 in the @env{PATH}. The bug is fixed in FreeBSD 9.3 and 10.0,
1540so either upgrade or use GNU m4. Note that the FreeBSD package system installs
1541GNU m4 under the name @samp{gm4}, which GMP cannot guess.
1542
1543@item FreeBSD 7.x, 8.x, 9.x
1544@cindex FreeBSD
1545GMP releases starting with 6.0 do not support @samp{ABI=32} on FreeBSD/amd64
1546prior to release 10.0 of the system. The cause is a broken @code{limits.h},
1547which GMP no longer works around.
1548
1549@item MS-DOS and MS Windows
1550@cindex MS-DOS
1551@cindex MS Windows
1552@cindex Windows
1553@cindex Cygwin
1554@cindex DJGPP
1555@cindex MINGW
1556On an MS-DOS system DJGPP can be used to build GMP, and on an MS Windows
1557system Cygwin, DJGPP and MINGW can be used. All three are excellent ports of
1558GCC and the various GNU tools.
1559
1560@display
1561@uref{https://www.cygwin.com/}
1562@uref{http://www.delorie.com/djgpp/}
1563@uref{http://www.mingw.org/}
1564@end display
1565
1566@cindex Interix
1567@cindex Services for Unix
1568Microsoft also publishes an Interix ``Services for Unix'' which can be used to
1569build GMP on Windows (with a normal @samp{./configure}), but it's not free
1570software.
1571
1572@item MS Windows DLLs
1573@cindex DLLs
1574@cindex MS Windows
1575@cindex Windows
1576On systems @samp{*-*-cygwin*}, @samp{*-*-mingw*} and @samp{*-*-pw32*} by
1577default GMP builds only a static library, but a DLL can be built instead using
1578
1579@example
1580./configure --disable-static --enable-shared
1581@end example
1582
1583Static and DLL libraries can't both be built, since certain export directives
1584in @file{gmp.h} must be different.
1585
1586A MINGW DLL build of GMP can be used with Microsoft C@. Libtool doesn't
1587install a @file{.lib} format import library, but it can be created with MS
1588@command{lib} as follows, and copied to the install directory. Similarly for
1589@file{libmp} and @file{libgmpxx}.
1590
1591@example
1592cd .libs
1593lib /def:libgmp-3.dll.def /out:libgmp-3.lib
1594@end example
1595
1596MINGW uses the C runtime library @samp{msvcrt.dll} for I/O, so applications
1597wanting to use the GMP I/O routines must be compiled with @samp{cl /MD} to do
1598the same. If one of the other C runtime library choices provided by MS C is
1599desired then the suggestion is to use the GMP string functions and confine I/O
1600to the application.
1601
1602@item Motorola 68k CPU Types
1603@cindex 68000
1604@samp{m68k} is taken to mean 68000. @samp{m68020} or higher will give a
1605performance boost on applicable CPUs. @samp{m68360} can be used for CPU32
1606series chips. @samp{m68302} can be used for ``Dragonball'' series chips,
1607though this is merely a synonym for @samp{m68000}.
1608
1609@item NetBSD 5.x
1610@cindex NetBSD
1611@command{m4} in these releases of NetBSD has an eval function which ignores its
16122nd and 3rd arguments, which makes it unsuitable for @file{.asm} file
1613processing. @samp{./configure} will detect the problem and either abort or
1614choose another m4 in the @env{PATH}. The bug is fixed in NetBSD 6, so either
1615upgrade or use GNU m4. Note that the NetBSD package system installs GNU m4
1616under the name @samp{gm4}, which GMP cannot guess.
1617
1618@item OpenBSD 2.6
1619@cindex OpenBSD
1620@command{m4} in this release of OpenBSD has a bug in @code{eval} that makes it
1621unsuitable for @file{.asm} file processing. @samp{./configure} will detect
1622the problem and either abort or choose another m4 in the @env{PATH}. The bug
1623is fixed in OpenBSD 2.7, so either upgrade or use GNU m4.
1624
1625@item Power CPU Types
1626@cindex Power/PowerPC
1627In GMP, CPU types @samp{power*} and @samp{powerpc*} will each use instructions
1628not available on the other, so it's important to choose the right one for the
1629CPU that will be used. Currently GMP has no assembly code support for using
1630just the common instruction subset. To get executables that run on both, the
1631current suggestion is to use the generic C code (@option{--disable-assembly}),
1632possibly with appropriate compiler options (like @samp{-mcpu=common} for
1633@command{gcc}). CPU @samp{rs6000} (which is not a CPU but a family of
1634workstations) is accepted by @file{config.sub}, but is currently equivalent to
1635@option{--disable-assembly}.
1636
1637@item Sparc CPU Types
1638@cindex Sparc
1639@samp{sparcv8} or @samp{supersparc} on relevant systems will give a
1640significant performance increase over the V7 code selected by plain
1641@samp{sparc}.
1642
1643@item Sparc App Regs
1644@cindex Sparc
1645The GMP assembly code for both 32-bit and 64-bit Sparc clobbers the
1646``application registers'' @code{g2}, @code{g3} and @code{g4}, the same way
1647that the GCC default @samp{-mapp-regs} does (@pxref{SPARC Options,, SPARC
1648Options, gcc, Using the GNU Compiler Collection (GCC)}).
1649
1650This makes that code unsuitable for use with the special V9
1651@samp{-mcmodel=embmedany} (which uses @code{g4} as a data segment pointer), and
1652for applications wanting to use those registers for special purposes. In these
1653cases the only suggestion currently is to build GMP with
1654@option{--disable-assembly} to avoid the assembly code.
1655
1656@item SunOS 4
1657@cindex SunOS
1658@command{/usr/bin/m4} lacks various features needed to process @file{.asm}
1659files, and instead @samp{./configure} will automatically use
1660@command{/usr/5bin/m4}, which we believe is always available (if not then use
1661GNU m4).
1662
1663@item x86 CPU Types
1664@cindex x86
1665@cindex 80x86
1666@cindex i386
1667@samp{i586}, @samp{pentium} or @samp{pentiummmx} code is good for its intended
1668P5 Pentium chips, but quite slow when run on Intel P6 class chips (PPro, P-II,
1669P-III)@. @samp{i386} is a better choice when making binaries that must run on
1670both.
1671
1672@item x86 MMX and SSE2 Code
1673@cindex MMX
1674@cindex SSE2
1675If the CPU selected has MMX code but the assembler doesn't support it, a
1676warning is given and non-MMX code is used instead. This will be an inferior
1677build, since the MMX code that's present is there because it's faster than the
1678corresponding plain integer code. The same applies to SSE2.
1679
1680Old versions of @samp{gas} don't support MMX instructions, in particular
1681version 1.92.3 that comes with FreeBSD 2.2.8 or the more recent OpenBSD 3.1
1682doesn't.
1683
1684Solaris 2.6 and 2.7 @command{as} generate incorrect object code for register
1685to register @code{movq} instructions, and so can't be used for MMX code.
1686Install a recent @command{gas} if MMX code is wanted on these systems.
1687@end table
1688
1689
1690@need 2000
1691@node Known Build Problems, Performance optimization, Notes for Particular Systems, Installing GMP
1692@section Known Build Problems
1693@cindex Build problems known
1694
1695@c This section is more or less meant for known build problems that are not
1696@c otherwise worked around and require some sort of manual intervention.
1697
1698You might find more up-to-date information at @uref{https://gmplib.org/}.
1699
1700@table @asis
1701@item Compiler link options
1702The version of libtool currently in use rather aggressively strips compiler
1703options when linking a shared library. This will hopefully be relaxed in the
1704future, but for now if this is a problem the suggestion is to create a little
1705script to hide them, and for instance configure with
1706
1707@example
1708./configure CC=gcc-with-my-options
1709@end example
1710
1711@item DJGPP (@samp{*-*-msdosdjgpp*})
1712@cindex DJGPP
1713The DJGPP port of @command{bash} 2.03 is unable to run the @samp{configure}
1714script, it exits silently, having died writing a preamble to
1715@file{config.log}. Use @command{bash} 2.04 or higher.
1716
1717@samp{make all} was found to run out of memory during the final
1718@file{libgmp.la} link on one system tested, despite having 64Mb available.
1719Running @samp{make libgmp.la} directly helped, perhaps recursing into the
1720various subdirectories uses up memory.
1721
1722@item GNU binutils @command{strip} prior to 2.12
1723@cindex Stripped libraries
1724@cindex Binutils @command{strip}
1725@cindex GNU @command{strip}
1726@command{strip} from GNU binutils 2.11 and earlier should not be used on the
1727static libraries @file{libgmp.a} and @file{libmp.a} since it will discard all
1728but the last of multiple archive members with the same name, like the three
1729versions of @file{init.o} in @file{libgmp.a}. Binutils 2.12 or higher can be
1730used successfully.
1731
1732The shared libraries @file{libgmp.so} and @file{libmp.so} are not affected by
1733this and any version of @command{strip} can be used on them.
1734
1735@item @command{make} syntax error
1736@cindex SCO
1737@cindex IRIX
1738On certain versions of SCO OpenServer 5 and IRIX 6.5 the native @command{make}
1739is unable to handle the long dependencies list for @file{libgmp.la}. The
1740symptom is a ``syntax error'' on the following line of the top-level
1741@file{Makefile}.
1742
1743@example
1744libgmp.la: $(libgmp_la_OBJECTS) $(libgmp_la_DEPENDENCIES)
1745@end example
1746
1747Either use GNU Make, or as a workaround remove
1748@code{$(libgmp_la_DEPENDENCIES)} from that line (which will make the initial
1749build work, but if any recompiling is done @file{libgmp.la} might not be
1750rebuilt).
1751
1752@item MacOS X (@samp{*-*-darwin*})
1753@cindex MacOS X
1754@cindex Darwin
1755Libtool currently only knows how to create shared libraries on MacOS X using
1756the native @command{cc} (which is a modified GCC), not a plain GCC@. A
1757static-only build should work though (@samp{--disable-shared}).
1758
1759@item NeXT prior to 3.3
1760@cindex NeXT
1761The system compiler on old versions of NeXT was a massacred and old GCC, even
1762if it called itself @file{cc}. This compiler cannot be used to build GMP, you
1763need to get a real GCC, and install that. (NeXT may have fixed this in
1764release 3.3 of their system.)
1765
1766@item POWER and PowerPC
1767@cindex Power/PowerPC
1768Bugs in GCC 2.7.2 (and 2.6.3) mean it can't be used to compile GMP on POWER or
1769PowerPC@. If you want to use GCC for these machines, get GCC 2.7.2.1 (or
1770later).
1771
1772@item Sequent Symmetry
1773@cindex Sequent Symmetry
1774Use the GNU assembler instead of the system assembler, since the latter has
1775serious bugs.
1776
1777@item Solaris 2.6
1778@cindex Solaris
1779The system @command{sed} prints an error ``Output line too long'' when libtool
1780builds @file{libgmp.la}. This doesn't seem to cause any obvious ill effects,
1781but GNU @command{sed} is recommended, to avoid any doubt.
1782
1783@item Sparc Solaris 2.7 with gcc 2.95.2 in @samp{ABI=32}
1784@cindex Solaris
1785A shared library build of GMP seems to fail in this combination, it builds but
1786then fails the tests, apparently due to some incorrect data relocations within
1787@code{gmp_randinit_lc_2exp_size}. The exact cause is unknown,
1788@samp{--disable-shared} is recommended.
1789@end table
1790
1791
1792@need 2000
1793@node Performance optimization, , Known Build Problems, Installing GMP
1794@section Performance optimization
1795@cindex Optimizing performance
1796
1797@c At some point, this should perhaps move to a separate chapter on optimizing
1798@c performance.
1799
1800For optimal performance, build GMP for the exact CPU type of the target
1801computer, see @ref{Build Options}.
1802
1803Unlike what is the case for most other programs, the compiler typically
1804doesn't matter much, since GMP uses assembly language for the most critical
1805operation.
1806
1807In particular for long-running GMP applications, and applications demanding
1808extremely large numbers, building and running the @code{tuneup} program in the
1809@file{tune} subdirectory, can be important. For example,
1810
1811@example
1812cd tune
1813make tuneup
1814./tuneup
1815@end example
1816
1817will generate better contents for the @file{gmp-mparam.h} parameter file.
1818
1819To use the results, put the output in the file indicated in the
1820@samp{Parameters for ...} header. Then recompile from scratch.
1821
1822The @code{tuneup} program takes one useful parameter, @samp{-f NNN}, which
1823instructs the program how long to check FFT multiply parameters. If you're
1824going to use GMP for extremely large numbers, you may want to run @code{tuneup}
1825with a large NNN value.
1826
1827
1828@node GMP Basics, Reporting Bugs, Installing GMP, Top
1829@comment node-name, next, previous, up
1830@chapter GMP Basics
1831@cindex Basics
1832
1833@strong{Using functions, macros, data types, etc.@: not documented in this
1834manual is strongly discouraged. If you do so your application is guaranteed
1835to be incompatible with future versions of GMP.}
1836
1837@menu
1838* Headers and Libraries::
1839* Nomenclature and Types::
1840* Function Classes::
1841* Variable Conventions::
1842* Parameter Conventions::
1843* Memory Management::
1844* Reentrancy::
1845* Useful Macros and Constants::
1846* Compatibility with older versions::
1847* Demonstration Programs::
1848* Efficiency::
1849* Debugging::
1850* Profiling::
1851* Autoconf::
1852* Emacs::
1853@end menu
1854
1855@node Headers and Libraries, Nomenclature and Types, GMP Basics, GMP Basics
1856@section Headers and Libraries
1857@cindex Headers
1858
1859@cindex @file{gmp.h}
1860@cindex Include files
1861@cindex @code{#include}
1862All declarations needed to use GMP are collected in the include file
1863@file{gmp.h}. It is designed to work with both C and C++ compilers.
1864
1865@example
1866#include <gmp.h>
1867@end example
1868
1869@cindex @code{stdio.h}
1870Note however that prototypes for GMP functions with @code{FILE *} parameters
1871are only provided if @code{<stdio.h>} is included too.
1872
1873@example
1874#include <stdio.h>
1875#include <gmp.h>
1876@end example
1877
1878@cindex @code{stdarg.h}
1879Likewise @code{<stdarg.h>} is required for prototypes with @code{va_list}
1880parameters, such as @code{gmp_vprintf}. And @code{<obstack.h>} for prototypes
1881with @code{struct obstack} parameters, such as @code{gmp_obstack_printf}, when
1882available.
1883
1884@cindex Libraries
1885@cindex Linking
1886@cindex @code{libgmp}
1887All programs using GMP must link against the @file{libgmp} library. On a
1888typical Unix-like system this can be done with @samp{-lgmp}, for example
1889
1890@example
1891gcc myprogram.c -lgmp
1892@end example
1893
1894@cindex @code{libgmpxx}
1895GMP C++ functions are in a separate @file{libgmpxx} library. This is built
1896and installed if C++ support has been enabled (@pxref{Build Options}). For
1897example,
1898
1899@example
1900g++ mycxxprog.cc -lgmpxx -lgmp
1901@end example
1902
1903@cindex Libtool
1904GMP is built using Libtool and an application can use that to link if desired,
1905@GMPpxreftop{libtool, GNU Libtool}.
1906
1907If GMP has been installed to a non-standard location then it may be necessary
1908to use @samp{-I} and @samp{-L} compiler options to point to the right
1909directories, and some sort of run-time path for a shared library.
1910
1911
1912@node Nomenclature and Types, Function Classes, Headers and Libraries, GMP Basics
1913@section Nomenclature and Types
1914@cindex Nomenclature
1915@cindex Types
1916
1917@cindex Integer
1918@tindex @code{mpz_t}
1919In this manual, @dfn{integer} usually means a multiple precision integer, as
1920defined by the GMP library. The C data type for such integers is @code{mpz_t}.
1921Here are some examples of how to declare such integers:
1922
1923@example
1924mpz_t sum;
1925
1926struct foo @{ mpz_t x, y; @};
1927
1928mpz_t vec[20];
1929@end example
1930
1931@cindex Rational number
1932@tindex @code{mpq_t}
1933@dfn{Rational number} means a multiple precision fraction. The C data type
1934for these fractions is @code{mpq_t}. For example:
1935
1936@example
1937mpq_t quotient;
1938@end example
1939
1940@cindex Floating-point number
1941@tindex @code{mpf_t}
1942@dfn{Floating point number} or @dfn{Float} for short, is an arbitrary precision
1943mantissa with a limited precision exponent. The C data type for such objects
1944is @code{mpf_t}. For example:
1945
1946@example
1947mpf_t fp;
1948@end example
1949
1950@tindex @code{mp_exp_t}
1951The floating point functions accept and return exponents in the C type
1952@code{mp_exp_t}. Currently this is usually a @code{long}, but on some systems
1953it's an @code{int} for efficiency.
1954
1955@cindex Limb
1956@tindex @code{mp_limb_t}
1957A @dfn{limb} means the part of a multi-precision number that fits in a single
1958machine word. (We chose this word because a limb of the human body is
1959analogous to a digit, only larger, and containing several digits.) Normally a
1960limb is 32 or 64 bits. The C data type for a limb is @code{mp_limb_t}.
1961
1962@tindex @code{mp_size_t}
1963Counts of limbs of a multi-precision number represented in the C type
1964@code{mp_size_t}. Currently this is normally a @code{long}, but on some
1965systems it's an @code{int} for efficiency, and on some systems it will be
1966@code{long long} in the future.
1967
1968@tindex @code{mp_bitcnt_t}
1969Counts of bits of a multi-precision number are represented in the C type
1970@code{mp_bitcnt_t}. Currently this is always an @code{unsigned long}, but on
1971some systems it will be an @code{unsigned long long} in the future.
1972
1973@cindex Random state
1974@tindex @code{gmp_randstate_t}
1975@dfn{Random state} means an algorithm selection and current state data. The C
1976data type for such objects is @code{gmp_randstate_t}. For example:
1977
1978@example
1979gmp_randstate_t rstate;
1980@end example
1981
1982Also, in general @code{mp_bitcnt_t} is used for bit counts and ranges, and
1983@code{size_t} is used for byte or character counts.
1984
1985
1986@node Function Classes, Variable Conventions, Nomenclature and Types, GMP Basics
1987@section Function Classes
1988@cindex Function classes
1989
1990There are six classes of functions in the GMP library:
1991
1992@enumerate
1993@item
1994Functions for signed integer arithmetic, with names beginning with
1995@code{mpz_}. The associated type is @code{mpz_t}. There are about 150
1996functions in this class. (@pxref{Integer Functions})
1997
1998@item
1999Functions for rational number arithmetic, with names beginning with
2000@code{mpq_}. The associated type is @code{mpq_t}. There are about 35
2001functions in this class, but the integer functions can be used for arithmetic
2002on the numerator and denominator separately. (@pxref{Rational Number
2003Functions})
2004
2005@item
2006Functions for floating-point arithmetic, with names beginning with
2007@code{mpf_}. The associated type is @code{mpf_t}. There are about 70
2008functions is this class. (@pxref{Floating-point Functions})
2009
2010@item
2011Fast low-level functions that operate on natural numbers. These are used by
2012the functions in the preceding groups, and you can also call them directly
2013from very time-critical user programs. These functions' names begin with
2014@code{mpn_}. The associated type is array of @code{mp_limb_t}. There are
2015about 60 (hard-to-use) functions in this class. (@pxref{Low-level Functions})
2016
2017@item
2018Miscellaneous functions. Functions for setting up custom allocation and
2019functions for generating random numbers. (@pxref{Custom Allocation}, and
2020@pxref{Random Number Functions})
2021@end enumerate
2022
2023
2024@node Variable Conventions, Parameter Conventions, Function Classes, GMP Basics
2025@section Variable Conventions
2026@cindex Variable conventions
2027@cindex Conventions for variables
2028
2029GMP functions generally have output arguments before input arguments. This
2030notation is by analogy with the assignment operator.
2031
2032GMP lets you use the same variable for both input and output in one call. For
2033example, the main function for integer multiplication, @code{mpz_mul}, can be
2034used to square @code{x} and put the result back in @code{x} with
2035
2036@example
2037mpz_mul (x, x, x);
2038@end example
2039
2040Before you can assign to a GMP variable, you need to initialize it by calling
2041one of the special initialization functions. When you're done with a
2042variable, you need to clear it out, using one of the functions for that
2043purpose. Which function to use depends on the type of variable. See the
2044chapters on integer functions, rational number functions, and floating-point
2045functions for details.
2046
2047A variable should only be initialized once, or at least cleared between each
2048initialization. After a variable has been initialized, it may be assigned to
2049any number of times.
2050
2051For efficiency reasons, avoid excessive initializing and clearing. In
2052general, initialize near the start of a function and clear near the end. For
2053example,
2054
2055@example
2056void
2057foo (void)
2058@{
2059 mpz_t n;
2060 int i;
2061 mpz_init (n);
2062 for (i = 1; i < 100; i++)
2063 @{
2064 mpz_mul (n, @dots{});
2065 mpz_fdiv_q (n, @dots{});
2066 @dots{}
2067 @}
2068 mpz_clear (n);
2069@}
2070@end example
2071
2072GMP types like @code{mpz_t} are implemented as one-element arrays of certain
2073structures. Declaring a variable creates an object with the fields GMP needs,
2074but variables are normally manipulated by using the pointer to the object. For
2075both behavior and efficiency reasons, it is discouraged to make copies of the
2076GMP object itself (either directly or via aggregate objects containing such GMP
2077objects). If copies are done, all of them must be used read-only; using a copy
2078as the output of some function will invalidate all the other copies. Note that
2079the actual fields in each @code{mpz_t} etc are for internal use only and should
2080not be accessed directly by code that expects to be compatible with future GMP
2081releases.
2082
2083@node Parameter Conventions, Memory Management, Variable Conventions, GMP Basics
2084@section Parameter Conventions
2085@cindex Parameter conventions
2086@cindex Conventions for parameters
2087
2088When a GMP variable is used as a function parameter, it's effectively a
2089call-by-reference, meaning that when the function stores a value there it will
2090change the original in the caller. Parameters which are input-only can be
2091designated @code{const} to provoke a compiler error or warning on attempting to
2092modify them.
2093
2094When a function is going to return a GMP result, it should designate a
2095parameter that it sets, like the library functions do. More than one value
2096can be returned by having more than one output parameter, again like the
2097library functions. A @code{return} of an @code{mpz_t} etc doesn't return the
2098object, only a pointer, and this is almost certainly not what's wanted.
2099
2100Here's an example accepting an @code{mpz_t} parameter, doing a calculation,
2101and storing the result to the indicated parameter.
2102
2103@example
2104void
2105foo (mpz_t result, const mpz_t param, unsigned long n)
2106@{
2107 unsigned long i;
2108 mpz_mul_ui (result, param, n);
2109 for (i = 1; i < n; i++)
2110 mpz_add_ui (result, result, i*7);
2111@}
2112
2113int
2114main (void)
2115@{
2116 mpz_t r, n;
2117 mpz_init (r);
2118 mpz_init_set_str (n, "123456", 0);
2119 foo (r, n, 20L);
2120 gmp_printf ("%Zd\n", r);
2121 return 0;
2122@}
2123@end example
2124
2125Our function @code{foo} works even if its caller passes the same variable for
2126@code{param} and @code{result}, just like the library functions. But
2127sometimes it's tricky to make that work, and an application might not want to
2128bother supporting that sort of thing.
2129
2130Since GMP types are implemented as one-element arrays, using a GMP variable as
2131a parameter passes a pointer to the object. Hence the call-by-reference.
2132
2133
2134@need 1000
2135@node Memory Management, Reentrancy, Parameter Conventions, GMP Basics
2136@section Memory Management
2137@cindex Memory management
2138
2139The GMP types like @code{mpz_t} are small, containing only a couple of sizes,
2140and pointers to allocated data. Once a variable is initialized, GMP takes
2141care of all space allocation. Additional space is allocated whenever a
2142variable doesn't have enough.
2143
2144@code{mpz_t} and @code{mpq_t} variables never reduce their allocated space.
2145Normally this is the best policy, since it avoids frequent reallocation.
2146Applications that need to return memory to the heap at some particular point
2147can use @code{mpz_realloc2}, or clear variables no longer needed.
2148
2149@code{mpf_t} variables, in the current implementation, use a fixed amount of
2150space, determined by the chosen precision and allocated at initialization, so
2151their size doesn't change.
2152
2153All memory is allocated using @code{malloc} and friends by default, but this
2154can be changed, see @ref{Custom Allocation}. Temporary memory on the stack is
2155also used (via @code{alloca}), but this can be changed at build-time if
2156desired, see @ref{Build Options}.
2157
2158
2159@node Reentrancy, Useful Macros and Constants, Memory Management, GMP Basics
2160@section Reentrancy
2161@cindex Reentrancy
2162@cindex Thread safety
2163@cindex Multi-threading
2164
2165@noindent
2166GMP is reentrant and thread-safe, with some exceptions:
2167
2168@itemize @bullet
2169@item
2170If configured with @option{--enable-alloca=malloc-notreentrant} (or with
2171@option{--enable-alloca=notreentrant} when @code{alloca} is not available),
2172then naturally GMP is not reentrant.
2173
2174@item
2175@code{mpf_set_default_prec} and @code{mpf_init} use a global variable for the
2176selected precision. @code{mpf_init2} can be used instead, and in the C++
2177interface an explicit precision to the @code{mpf_class} constructor.
2178
2179@item
2180@code{mpz_random} and the other old random number functions use a global
2181random state and are hence not reentrant. The newer random number functions
2182that accept a @code{gmp_randstate_t} parameter can be used instead.
2183
2184@item
2185@code{gmp_randinit} (obsolete) returns an error indication through a global
2186variable, which is not thread safe. Applications are advised to use
2187@code{gmp_randinit_default} or @code{gmp_randinit_lc_2exp} instead.
2188
2189@item
2190@code{mp_set_memory_functions} uses global variables to store the selected
2191memory allocation functions.
2192
2193@item
2194If the memory allocation functions set by a call to
2195@code{mp_set_memory_functions} (or @code{malloc} and friends by default) are
2196not reentrant, then GMP will not be reentrant either.
2197
2198@item
2199If the standard I/O functions such as @code{fwrite} are not reentrant then the
2200GMP I/O functions using them will not be reentrant either.
2201
2202@item
2203It's safe for two threads to read from the same GMP variable simultaneously,
2204but it's not safe for one to read while another might be writing, nor for
2205two threads to write simultaneously. It's not safe for two threads to
2206generate a random number from the same @code{gmp_randstate_t} simultaneously,
2207since this involves an update of that variable.
2208@end itemize
2209
2210
2211@need 2000
2212@node Useful Macros and Constants, Compatibility with older versions, Reentrancy, GMP Basics
2213@section Useful Macros and Constants
2214@cindex Useful macros and constants
2215@cindex Constants
2216
2217@deftypevr {Global Constant} {const int} mp_bits_per_limb
2218@findex mp_bits_per_limb
2219@cindex Bits per limb
2220@cindex Limb size
2221The number of bits per limb.
2222@end deftypevr
2223
2224@defmac __GNU_MP_VERSION
2225@defmacx __GNU_MP_VERSION_MINOR
2226@defmacx __GNU_MP_VERSION_PATCHLEVEL
2227@cindex Version number
2228@cindex GMP version number
2229The major and minor GMP version, and patch level, respectively, as integers.
2230For GMP i.j, these numbers will be i, j, and 0, respectively.
2231For GMP i.j.k, these numbers will be i, j, and k, respectively.
2232@end defmac
2233
2234@deftypevr {Global Constant} {const char * const} gmp_version
2235@findex gmp_version
2236The GMP version number, as a null-terminated string, in the form ``i.j.k''.
2237This release is @nicode{"@value{VERSION}"}. Note that the format ``i.j'' was
2238used, before version 4.3.0, when k was zero.
2239@end deftypevr
2240
2241@defmac __GMP_CC
2242@defmacx __GMP_CFLAGS
2243The compiler and compiler flags, respectively, used when compiling GMP, as
2244strings.
2245@end defmac
2246
2247
2248@node Compatibility with older versions, Demonstration Programs, Useful Macros and Constants, GMP Basics
2249@section Compatibility with older versions
2250@cindex Compatibility with older versions
2251@cindex Past GMP versions
2252@cindex Upward compatibility
2253
2254This version of GMP is upwardly binary compatible with all 5.x, 4.x, and 3.x
2255versions, and upwardly compatible at the source level with all 2.x versions,
2256with the following exceptions.
2257
2258@itemize @bullet
2259@item
2260@code{mpn_gcd} had its source arguments swapped as of GMP 3.0, for consistency
2261with other @code{mpn} functions.
2262
2263@item
2264@code{mpf_get_prec} counted precision slightly differently in GMP 3.0 and
22653.0.1, but in 3.1 reverted to the 2.x style.
2266
2267@item
2268@code{mpn_bdivmod}, documented as preliminary in GMP 4, has been removed.
2269@end itemize
2270
2271There are a number of compatibility issues between GMP 1 and GMP 2 that of
2272course also apply when porting applications from GMP 1 to GMP 5. Please
2273see the GMP 2 manual for details.
2274
2275@c @item Integer division functions round the result differently. The obsolete
2276@c functions (@code{mpz_div}, @code{mpz_divmod}, @code{mpz_mdiv},
2277@c @code{mpz_mdivmod}, etc) now all use floor rounding (i.e., they round the
2278@c quotient towards
2279@c @ifinfo
2280@c @minus{}infinity).
2281@c @end ifinfo
2282@c @iftex
2283@c @tex
2284@c $-\infty$).
2285@c @end tex
2286@c @end iftex
2287@c There are a lot of functions for integer division, giving the user better
2288@c control over the rounding.
2289
2290@c @item The function @code{mpz_mod} now compute the true @strong{mod} function.
2291
2292@c @item The functions @code{mpz_powm} and @code{mpz_powm_ui} now use
2293@c @strong{mod} for reduction.
2294
2295@c @item The assignment functions for rational numbers do no longer canonicalize
2296@c their results. In the case a non-canonical result could arise from an
2297@c assignment, the user need to insert an explicit call to
2298@c @code{mpq_canonicalize}. This change was made for efficiency.
2299
2300@c @item Output generated by @code{mpz_out_raw} in this release cannot be read
2301@c by @code{mpz_inp_raw} in previous releases. This change was made for making
2302@c the file format truly portable between machines with different word sizes.
2303
2304@c @item Several @code{mpn} functions have changed. But they were intentionally
2305@c undocumented in previous releases.
2306
2307@c @item The functions @code{mpz_cmp_ui}, @code{mpz_cmp_si}, and @code{mpq_cmp_ui}
2308@c are now implemented as macros, and thereby sometimes evaluate their
2309@c arguments multiple times.
2310
2311@c @item The functions @code{mpz_pow_ui} and @code{mpz_ui_pow_ui} now yield 1
2312@c for 0^0. (In version 1, they yielded 0.)
2313
2314@c In version 1 of the library, @code{mpq_set_den} handled negative
2315@c denominators by copying the sign to the numerator. That is no longer done.
2316
2317@c Pure assignment functions do not canonicalize the assigned variable. It is
2318@c the responsibility of the user to canonicalize the assigned variable before
2319@c any arithmetic operations are performed on that variable.
2320@c Note that this is an incompatible change from version 1 of the library.
2321
2322@c @end enumerate
2323
2324
2325@need 1000
2326@node Demonstration Programs, Efficiency, Compatibility with older versions, GMP Basics
2327@section Demonstration programs
2328@cindex Demonstration programs
2329@cindex Example programs
2330@cindex Sample programs
2331The @file{demos} subdirectory has some sample programs using GMP@. These
2332aren't built or installed, but there's a @file{Makefile} with rules for them.
2333For instance,
2334
2335@example
2336make pexpr
2337./pexpr 68^975+10
2338@end example
2339
2340@noindent
2341The following programs are provided
2342
2343@itemize @bullet
2344@item
2345@cindex Expression parsing demo
2346@cindex Parsing expressions demo
2347@samp{pexpr} is an expression evaluator, the program used on the GMP web page.
2348@item
2349@cindex Expression parsing demo
2350@cindex Parsing expressions demo
2351The @samp{calc} subdirectory has a similar but simpler evaluator using
2352@command{lex} and @command{yacc}.
2353@item
2354@cindex Expression parsing demo
2355@cindex Parsing expressions demo
2356The @samp{expr} subdirectory is yet another expression evaluator, a library
2357designed for ease of use within a C program. See @file{demos/expr/README} for
2358more information.
2359@item
2360@cindex Factorization demo
2361@samp{factorize} is a Pollard-Rho factorization program.
2362@item
2363@samp{isprime} is a command-line interface to the @code{mpz_probab_prime_p}
2364function.
2365@item
2366@samp{primes} counts or lists primes in an interval, using a sieve.
2367@item
2368@samp{qcn} is an example use of @code{mpz_kronecker_ui} to estimate quadratic
2369class numbers.
2370@item
2371@cindex @code{perl}
2372@cindex GMP Perl module
2373@cindex Perl module
2374The @samp{perl} subdirectory is a comprehensive perl interface to GMP@. See
2375@file{demos/perl/INSTALL} for more information. Documentation is in POD
2376format in @file{demos/perl/GMP.pm}.
2377@end itemize
2378
2379As an aside, consideration has been given at various times to some sort of
2380expression evaluation within the main GMP library. Going beyond something
2381minimal quickly leads to matters like user-defined functions, looping, fixnums
2382for control variables, etc, which are considered outside the scope of GMP
2383(much closer to language interpreters or compilers, @xref{Language Bindings}.)
2384Something simple for program input convenience may yet be a possibility, a
2385combination of the @file{expr} demo and the @file{pexpr} tree back-end
2386perhaps. But for now the above evaluators are offered as illustrations.
2387
2388
2389@need 1000
2390@node Efficiency, Debugging, Demonstration Programs, GMP Basics
2391@section Efficiency
2392@cindex Efficiency
2393
2394@table @asis
2395@item Small Operands
2396@cindex Small operands
2397On small operands, the time for function call overheads and memory allocation
2398can be significant in comparison to actual calculation. This is unavoidable
2399in a general purpose variable precision library, although GMP attempts to be
2400as efficient as it can on both large and small operands.
2401
2402@item Static Linking
2403@cindex Static linking
2404On some CPUs, in particular the x86s, the static @file{libgmp.a} should be
2405used for maximum speed, since the PIC code in the shared @file{libgmp.so} will
2406have a small overhead on each function call and global data address. For many
2407programs this will be insignificant, but for long calculations there's a gain
2408to be had.
2409
2410@item Initializing and Clearing
2411@cindex Initializing and clearing
2412Avoid excessive initializing and clearing of variables, since this can be
2413quite time consuming, especially in comparison to otherwise fast operations
2414like addition.
2415
2416A language interpreter might want to keep a free list or stack of
2417initialized variables ready for use. It should be possible to integrate
2418something like that with a garbage collector too.
2419
2420@item Reallocations
2421@cindex Reallocations
2422An @code{mpz_t} or @code{mpq_t} variable used to hold successively increasing
2423values will have its memory repeatedly @code{realloc}ed, which could be quite
2424slow or could fragment memory, depending on the C library. If an application
2425can estimate the final size then @code{mpz_init2} or @code{mpz_realloc2} can
2426be called to allocate the necessary space from the beginning
2427(@pxref{Initializing Integers}).
2428
2429It doesn't matter if a size set with @code{mpz_init2} or @code{mpz_realloc2}
2430is too small, since all functions will do a further reallocation if necessary.
2431Badly overestimating memory required will waste space though.
2432
2433@item @code{2exp} Functions
2434@cindex @code{2exp} functions
2435It's up to an application to call functions like @code{mpz_mul_2exp} when
2436appropriate. General purpose functions like @code{mpz_mul} make no attempt to
2437identify powers of two or other special forms, because such inputs will
2438usually be very rare and testing every time would be wasteful.
2439
2440@item @code{ui} and @code{si} Functions
2441@cindex @code{ui} and @code{si} functions
2442The @code{ui} functions and the small number of @code{si} functions exist for
2443convenience and should be used where applicable. But if for example an
2444@code{mpz_t} contains a value that fits in an @code{unsigned long} there's no
2445need extract it and call a @code{ui} function, just use the regular @code{mpz}
2446function.
2447
2448@item In-Place Operations
2449@cindex In-place operations
2450@code{mpz_abs}, @code{mpq_abs}, @code{mpf_abs}, @code{mpz_neg}, @code{mpq_neg}
2451and @code{mpf_neg} are fast when used for in-place operations like
2452@code{mpz_abs(x,x)}, since in the current implementation only a single field
2453of @code{x} needs changing. On suitable compilers (GCC for instance) this is
2454inlined too.
2455
2456@code{mpz_add_ui}, @code{mpz_sub_ui}, @code{mpf_add_ui} and @code{mpf_sub_ui}
2457benefit from an in-place operation like @code{mpz_add_ui(x,x,y)}, since
2458usually only one or two limbs of @code{x} will need to be changed. The same
2459applies to the full precision @code{mpz_add} etc if @code{y} is small. If
2460@code{y} is big then cache locality may be helped, but that's all.
2461
2462@code{mpz_mul} is currently the opposite, a separate destination is slightly
2463better. A call like @code{mpz_mul(x,x,y)} will, unless @code{y} is only one
2464limb, make a temporary copy of @code{x} before forming the result. Normally
2465that copying will only be a tiny fraction of the time for the multiply, so
2466this is not a particularly important consideration.
2467
2468@code{mpz_set}, @code{mpq_set}, @code{mpq_set_num}, @code{mpf_set}, etc, make
2469no attempt to recognise a copy of something to itself, so a call like
2470@code{mpz_set(x,x)} will be wasteful. Naturally that would never be written
2471deliberately, but if it might arise from two pointers to the same object then
2472a test to avoid it might be desirable.
2473
2474@example
2475if (x != y)
2476 mpz_set (x, y);
2477@end example
2478
2479Note that it's never worth introducing extra @code{mpz_set} calls just to get
2480in-place operations. If a result should go to a particular variable then just
2481direct it there and let GMP take care of data movement.
2482
2483@item Divisibility Testing (Small Integers)
2484@cindex Divisibility testing
2485@code{mpz_divisible_ui_p} and @code{mpz_congruent_ui_p} are the best functions
2486for testing whether an @code{mpz_t} is divisible by an individual small
2487integer. They use an algorithm which is faster than @code{mpz_tdiv_ui}, but
2488which gives no useful information about the actual remainder, only whether
2489it's zero (or a particular value).
2490
2491However when testing divisibility by several small integers, it's best to take
2492a remainder modulo their product, to save multi-precision operations. For
2493instance to test whether a number is divisible by any of 23, 29 or 31 take a
2494remainder modulo @math{23@times{}29@times{}31 = 20677} and then test that.
2495
2496The division functions like @code{mpz_tdiv_q_ui} which give a quotient as well
2497as a remainder are generally a little slower than the remainder-only functions
2498like @code{mpz_tdiv_ui}. If the quotient is only rarely wanted then it's
2499probably best to just take a remainder and then go back and calculate the
2500quotient if and when it's wanted (@code{mpz_divexact_ui} can be used if the
2501remainder is zero).
2502
2503@item Rational Arithmetic
2504@cindex Rational arithmetic
2505The @code{mpq} functions operate on @code{mpq_t} values with no common factors
2506in the numerator and denominator. Common factors are checked-for and cast out
2507as necessary. In general, cancelling factors every time is the best approach
2508since it minimizes the sizes for subsequent operations.
2509
2510However, applications that know something about the factorization of the
2511values they're working with might be able to avoid some of the GCDs used for
2512canonicalization, or swap them for divisions. For example when multiplying by
2513a prime it's enough to check for factors of it in the denominator instead of
2514doing a full GCD@. Or when forming a big product it might be known that very
2515little cancellation will be possible, and so canonicalization can be left to
2516the end.
2517
2518The @code{mpq_numref} and @code{mpq_denref} macros give access to the
2519numerator and denominator to do things outside the scope of the supplied
2520@code{mpq} functions. @xref{Applying Integer Functions}.
2521
2522The canonical form for rationals allows mixed-type @code{mpq_t} and integer
2523additions or subtractions to be done directly with multiples of the
2524denominator. This will be somewhat faster than @code{mpq_add}. For example,
2525
2526@example
2527/* mpq increment */
2528mpz_add (mpq_numref(q), mpq_numref(q), mpq_denref(q));
2529
2530/* mpq += unsigned long */
2531mpz_addmul_ui (mpq_numref(q), mpq_denref(q), 123UL);
2532
2533/* mpq -= mpz */
2534mpz_submul (mpq_numref(q), mpq_denref(q), z);
2535@end example
2536
2537@item Number Sequences
2538@cindex Number sequences
2539Functions like @code{mpz_fac_ui}, @code{mpz_fib_ui} and @code{mpz_bin_uiui}
2540are designed for calculating isolated values. If a range of values is wanted
2541it's probably best to call to get a starting point and iterate from there.
2542
2543@item Text Input/Output
2544@cindex Text input/output
2545Hexadecimal or octal are suggested for input or output in text form.
2546Power-of-2 bases like these can be converted much more efficiently than other
2547bases, like decimal. For big numbers there's usually nothing of particular
2548interest to be seen in the digits, so the base doesn't matter much.
2549
2550Maybe we can hope octal will one day become the normal base for everyday use,
2551as proposed by King Charles XII of Sweden and later reformers.
2552@c Reference: Knuth volume 2 section 4.1, page 184 of second edition. :-)
2553@end table
2554
2555
2556@node Debugging, Profiling, Efficiency, GMP Basics
2557@section Debugging
2558@cindex Debugging
2559
2560@table @asis
2561@item Stack Overflow
2562@cindex Stack overflow
2563@cindex Segmentation violation
2564@cindex Bus error
2565Depending on the system, a segmentation violation or bus error might be the
2566only indication of stack overflow. See @samp{--enable-alloca} choices in
2567@ref{Build Options}, for how to address this.
2568
2569In new enough versions of GCC, @samp{-fstack-check} may be able to ensure an
2570overflow is recognised by the system before too much damage is done, or
2571@samp{-fstack-limit-symbol} or @samp{-fstack-limit-register} may be able to
2572add checking if the system itself doesn't do any (@pxref{Code Gen Options,,
2573Options for Code Generation, gcc, Using the GNU Compiler Collection (GCC)}).
2574These options must be added to the @samp{CFLAGS} used in the GMP build
2575(@pxref{Build Options}), adding them just to an application will have no
2576effect. Note also they're a slowdown, adding overhead to each function call
2577and each stack allocation.
2578
2579@item Heap Problems
2580@cindex Heap problems
2581@cindex Malloc problems
2582The most likely cause of application problems with GMP is heap corruption.
2583Failing to @code{init} GMP variables will have unpredictable effects, and
2584corruption arising elsewhere in a program may well affect GMP@. Initializing
2585GMP variables more than once or failing to clear them will cause memory leaks.
2586
2587@cindex Malloc debugger
2588In all such cases a @code{malloc} debugger is recommended. On a GNU or BSD
2589system the standard C library @code{malloc} has some diagnostic facilities,
2590see @ref{Allocation Debugging,, Allocation Debugging, libc, The GNU C Library
2591Reference Manual}, or @samp{man 3 malloc}. Other possibilities, in no
2592particular order, include
2593
2594@display
2595@uref{http://cs.ecs.baylor.edu/~donahoo/tools/ccmalloc/}
2596@uref{http://dmalloc.com/}
2597@uref{https://wiki.gnome.org/Apps/MemProf}
2598@end display
2599
2600The GMP default allocation routines in @file{memory.c} also have a simple
2601sentinel scheme which can be enabled with @code{#define DEBUG} in that file.
2602This is mainly designed for detecting buffer overruns during GMP development,
2603but might find other uses.
2604
2605@item Stack Backtraces
2606@cindex Stack backtrace
2607On some systems the compiler options GMP uses by default can interfere with
2608debugging. In particular on x86 and 68k systems @samp{-fomit-frame-pointer}
2609is used and this generally inhibits stack backtracing. Recompiling without
2610such options may help while debugging, though the usual caveats about it
2611potentially moving a memory problem or hiding a compiler bug will apply.
2612
2613@item GDB, the GNU Debugger
2614@cindex GDB
2615@cindex GNU Debugger
2616A sample @file{.gdbinit} is included in the distribution, showing how to call
2617some undocumented dump functions to print GMP variables from within GDB@. Note
2618that these functions shouldn't be used in final application code since they're
2619undocumented and may be subject to incompatible changes in future versions of
2620GMP.
2621
2622@item Source File Paths
2623GMP has multiple source files with the same name, in different directories.
2624For example @file{mpz}, @file{mpq} and @file{mpf} each have an
2625@file{init.c}. If the debugger can't already determine the right one it may
2626help to build with absolute paths on each C file. One way to do that is to
2627use a separate object directory with an absolute path to the source directory.
2628
2629@example
2630cd /my/build/dir
2631/my/source/dir/gmp-@value{VERSION}/configure
2632@end example
2633
2634This works via @code{VPATH}, and might require GNU @command{make}.
2635Alternately it might be possible to change the @code{.c.lo} rules
2636appropriately.
2637
2638@item Assertion Checking
2639@cindex Assertion checking
2640The build option @option{--enable-assert} is available to add some consistency
2641checks to the library (see @ref{Build Options}). These are likely to be of
2642limited value to most applications. Assertion failures are just as likely to
2643indicate memory corruption as a library or compiler bug.
2644
2645Applications using the low-level @code{mpn} functions, however, will benefit
2646from @option{--enable-assert} since it adds checks on the parameters of most
2647such functions, many of which have subtle restrictions on their usage. Note
2648however that only the generic C code has checks, not the assembly code, so
2649@option{--disable-assembly} should be used for maximum checking.
2650
2651@item Temporary Memory Checking
2652The build option @option{--enable-alloca=debug} arranges that each block of
2653temporary memory in GMP is allocated with a separate call to @code{malloc} (or
2654the allocation function set with @code{mp_set_memory_functions}).
2655
2656This can help a malloc debugger detect accesses outside the intended bounds,
2657or detect memory not released. In a normal build, on the other hand,
2658temporary memory is allocated in blocks which GMP divides up for its own use,
2659or may be allocated with a compiler builtin @code{alloca} which will go
2660nowhere near any malloc debugger hooks.
2661
2662@item Maximum Debuggability
2663To summarize the above, a GMP build for maximum debuggability would be
2664
2665@example
2666./configure --disable-shared --enable-assert \
2667 --enable-alloca=debug --disable-assembly CFLAGS=-g
2668@end example
2669
2670For C++, add @samp{--enable-cxx CXXFLAGS=-g}.
2671
2672@item Checker
2673@cindex Checker
2674@cindex GCC Checker
2675The GCC checker (@uref{https://savannah.nongnu.org/projects/checker/}) can be
2676used with GMP@. It contains a stub library which means GMP applications
2677compiled with checker can use a normal GMP build.
2678
2679A build of GMP with checking within GMP itself can be made. This will run
2680very very slowly. On GNU/Linux for example,
2681
2682@cindex @command{checkergcc}
2683@example
2684./configure --disable-assembly CC=checkergcc
2685@end example
2686
2687@option{--disable-assembly} must be used, since the GMP assembly code doesn't
2688support the checking scheme. The GMP C++ features cannot be used, since
2689current versions of checker (0.9.9.1) don't yet support the standard C++
2690library.
2691
2692@item Valgrind
2693@cindex Valgrind
2694Valgrind (@uref{http://valgrind.org/}) is a memory checker for x86, ARM, MIPS,
2695PowerPC, and S/390. It translates and emulates machine instructions to do
2696strong checks for uninitialized data (at the level of individual bits), memory
2697accesses through bad pointers, and memory leaks.
2698
2699Valgrind does not always support every possible instruction, in particular
2700ones recently added to an ISA. Valgrind might therefore be incompatible with
2701a recent GMP or even a less recent GMP which is compiled using a recent GCC.
2702
2703GMP's assembly code sometimes promotes a read of the limbs to some larger size,
2704for efficiency. GMP will do this even at the start and end of a multilimb
2705operand, using naturally aligned operations on the larger type. This may lead
2706to benign reads outside of allocated areas, triggering complaints from
2707Valgrind. Valgrind's option @samp{--partial-loads-ok=yes} should help.
2708
2709@item Other Problems
2710Any suspected bug in GMP itself should be isolated to make sure it's not an
2711application problem, see @ref{Reporting Bugs}.
2712@end table
2713
2714
2715@node Profiling, Autoconf, Debugging, GMP Basics
2716@section Profiling
2717@cindex Profiling
2718@cindex Execution profiling
2719@cindex @code{--enable-profiling}
2720
2721Running a program under a profiler is a good way to find where it's spending
2722most time and where improvements can be best sought. The profiling choices
2723for a GMP build are as follows.
2724
2725@table @asis
2726@item @samp{--disable-profiling}
2727The default is to add nothing special for profiling.
2728
2729It should be possible to just compile the mainline of a program with @code{-p}
2730and use @command{prof} to get a profile consisting of timer-based sampling of
2731the program counter. Most of the GMP assembly code has the necessary symbol
2732information.
2733
2734This approach has the advantage of minimizing interference with normal program
2735operation, but on most systems the resolution of the sampling is quite low (10
2736milliseconds for instance), requiring long runs to get accurate information.
2737
2738@item @samp{--enable-profiling=prof}
2739@cindex @code{prof}
2740Build with support for the system @command{prof}, which means @samp{-p} added
2741to the @samp{CFLAGS}.
2742
2743This provides call counting in addition to program counter sampling, which
2744allows the most frequently called routines to be identified, and an average
2745time spent in each routine to be determined.
2746
2747The x86 assembly code has support for this option, but on other processors
2748the assembly routines will be as if compiled without @samp{-p} and therefore
2749won't appear in the call counts.
2750
2751On some systems, such as GNU/Linux, @samp{-p} in fact means @samp{-pg} and in
2752this case @samp{--enable-profiling=gprof} described below should be used
2753instead.
2754
2755@item @samp{--enable-profiling=gprof}
2756@cindex @code{gprof}
2757Build with support for @command{gprof}, which means @samp{-pg} added to the
2758@samp{CFLAGS}.
2759
2760This provides call graph construction in addition to call counting and program
2761counter sampling, which makes it possible to count calls coming from different
2762locations. For example the number of calls to @code{mpn_mul} from
2763@code{mpz_mul} versus the number from @code{mpf_mul}. The program counter
2764sampling is still flat though, so only a total time in @code{mpn_mul} would be
2765accumulated, not a separate amount for each call site.
2766
2767The x86 assembly code has support for this option, but on other processors
2768the assembly routines will be as if compiled without @samp{-pg} and therefore
2769not be included in the call counts.
2770
2771On x86 and m68k systems @samp{-pg} and @samp{-fomit-frame-pointer} are
2772incompatible, so the latter is omitted from the default flags in that case,
2773which might result in poorer code generation.
2774
2775Incidentally, it should be possible to use the @command{gprof} program with a
2776plain @samp{--enable-profiling=prof} build. But in that case only the
2777@samp{gprof -p} flat profile and call counts can be expected to be valid, not
2778the @samp{gprof -q} call graph.
2779
2780@item @samp{--enable-profiling=instrument}
2781@cindex @code{-finstrument-functions}
2782@cindex @code{instrument-functions}
2783Build with the GCC option @samp{-finstrument-functions} added to the
2784@samp{CFLAGS} (@pxref{Code Gen Options,, Options for Code Generation, gcc,
2785Using the GNU Compiler Collection (GCC)}).
2786
2787This inserts special instrumenting calls at the start and end of each
2788function, allowing exact timing and full call graph construction.
2789
2790This instrumenting is not normally a standard system feature and will require
2791support from an external library, such as
2792
2793@cindex FunctionCheck
2794@cindex fnccheck
2795@display
2796@uref{https://sourceforge.net/projects/fnccheck/}
2797@end display
2798
2799This should be included in @samp{LIBS} during the GMP configure so that test
2800programs will link. For example,
2801
2802@example
2803./configure --enable-profiling=instrument LIBS=-lfc
2804@end example
2805
2806On a GNU system the C library provides dummy instrumenting functions, so
2807programs compiled with this option will link. In this case it's only
2808necessary to ensure the correct library is added when linking an application.
2809
2810The x86 assembly code supports this option, but on other processors the
2811assembly routines will be as if compiled without
2812@samp{-finstrument-functions} meaning time spent in them will effectively be
2813attributed to their caller.
2814@end table
2815
2816
2817@node Autoconf, Emacs, Profiling, GMP Basics
2818@section Autoconf
2819@cindex Autoconf
2820
2821Autoconf based applications can easily check whether GMP is installed. The
2822only thing to be noted is that GMP library symbols from version 3 onwards have
2823prefixes like @code{__gmpz}. The following therefore would be a simple test,
2824
2825@cindex @code{AC_CHECK_LIB}
2826@example
2827AC_CHECK_LIB(gmp, __gmpz_init)
2828@end example
2829
2830This just uses the default @code{AC_CHECK_LIB} actions for found or not found,
2831but an application that must have GMP would want to generate an error if not
2832found. For example,
2833
2834@example
2835AC_CHECK_LIB(gmp, __gmpz_init, ,
2836 [AC_MSG_ERROR([GNU MP not found, see https://gmplib.org/])])
2837@end example
2838
2839If functions added in some particular version of GMP are required, then one of
2840those can be used when checking. For example @code{mpz_mul_si} was added in
2841GMP 3.1,
2842
2843@example
2844AC_CHECK_LIB(gmp, __gmpz_mul_si, ,
2845 [AC_MSG_ERROR(
2846 [GNU MP not found, or not 3.1 or up, see https://gmplib.org/])])
2847@end example
2848
2849An alternative would be to test the version number in @file{gmp.h} using say
2850@code{AC_EGREP_CPP}. That would make it possible to test the exact version,
2851if some particular sub-minor release is known to be necessary.
2852
2853In general it's recommended that applications should simply demand a new
2854enough GMP rather than trying to provide supplements for features not
2855available in past versions.
2856
2857Occasionally an application will need or want to know the size of a type at
2858configuration or preprocessing time, not just with @code{sizeof} in the code.
2859This can be done in the normal way with @code{mp_limb_t} etc, but GMP 4.0 or
2860up is best for this, since prior versions needed certain @samp{-D} defines on
2861systems using a @code{long long} limb. The following would suit Autoconf 2.50
2862or up,
2863
2864@example
2865AC_CHECK_SIZEOF(mp_limb_t, , [#include <gmp.h>])
2866@end example
2867
2868
2869@node Emacs, , Autoconf, GMP Basics
2870@section Emacs
2871@cindex Emacs
2872@cindex @code{info-lookup-symbol}
2873
2874@key{C-h C-i} (@code{info-lookup-symbol}) is a good way to find documentation
2875on C functions while editing (@pxref{Info Lookup, , Info Documentation Lookup,
2876emacs, The Emacs Editor}).
2877
2878The GMP manual can be included in such lookups by putting the following in
2879your @file{.emacs},
2880
2881@c This isn't pretty, but there doesn't seem to be a better way (in emacs
2882@c 21.2 at least). info-lookup->mode-value could be used for the "assoc"s,
2883@c but that function isn't documented, whereas info-lookup-alist is.
2884@c
2885@example
2886(eval-after-load "info-look"
2887 '(let ((mode-value (assoc 'c-mode (assoc 'symbol info-lookup-alist))))
2888 (setcar (nthcdr 3 mode-value)
2889 (cons '("(gmp)Function Index" nil "^ -.* " "\\>")
2890 (nth 3 mode-value)))))
2891@end example
2892
2893
2894@node Reporting Bugs, Integer Functions, GMP Basics, Top
2895@comment node-name, next, previous, up
2896@chapter Reporting Bugs
2897@cindex Reporting bugs
2898@cindex Bug reporting
2899
2900If you think you have found a bug in the GMP library, please investigate it
2901and report it. We have made this library available to you, and it is not too
2902much to ask you to report the bugs you find.
2903
2904Before you report a bug, check it's not already addressed in @ref{Known Build
2905Problems}, or perhaps @ref{Notes for Particular Systems}. You may also want
2906to check @uref{https://gmplib.org/} for patches for this release.
2907
2908Please include the following in any report,
2909
2910@itemize @bullet
2911@item
2912The GMP version number, and if pre-packaged or patched then say so.
2913
2914@item
2915A test program that makes it possible for us to reproduce the bug. Include
2916instructions on how to run the program.
2917
2918@item
2919A description of what is wrong. If the results are incorrect, in what way.
2920If you get a crash, say so.
2921
2922@item
2923If you get a crash, include a stack backtrace from the debugger if it's
2924informative (@samp{where} in @command{gdb}, or @samp{$C} in @command{adb}).
2925
2926@item
2927Please do not send core dumps, executables or @command{strace}s.
2928
2929@item
2930The @samp{configure} options you used when building GMP, if any.
2931
2932@item
2933The output from @samp{configure}, as printed to stdout, with any options used.
2934
2935@item
2936The name of the compiler and its version. For @command{gcc}, get the version
2937with @samp{gcc -v}, otherwise perhaps @samp{what `which cc`}, or similar.
2938
2939@item
2940The output from running @samp{uname -a}.
2941
2942@item
2943The output from running @samp{./config.guess}, and from running
2944@samp{./configfsf.guess} (might be the same).
2945
2946@item
2947If the bug is related to @samp{configure}, then the compressed contents of
2948@file{config.log}.
2949
2950@item
2951If the bug is related to an @file{asm} file not assembling, then the contents
2952of @file{config.m4} and the offending line or lines from the temporary
2953@file{mpn/tmp-<file>.s}.
2954@end itemize
2955
2956Please make an effort to produce a self-contained report, with something
2957definite that can be tested or debugged. Vague queries or piecemeal messages
2958are difficult to act on and don't help the development effort.
2959
2960It is not uncommon that an observed problem is actually due to a bug in the
2961compiler; the GMP code tends to explore interesting corners in compilers.
2962
2963If your bug report is good, we will do our best to help you get a corrected
2964version of the library; if the bug report is poor, we won't do anything about
2965it (except maybe ask you to send a better report).
2966
2967Send your report to: @email{gmp-bugs@@gmplib.org}.
2968
2969If you think something in this manual is unclear, or downright incorrect, or if
2970the language needs to be improved, please send a note to the same address.
2971
2972
2973@node Integer Functions, Rational Number Functions, Reporting Bugs, Top
2974@comment node-name, next, previous, up
2975@chapter Integer Functions
2976@cindex Integer functions
2977
2978This chapter describes the GMP functions for performing integer arithmetic.
2979These functions start with the prefix @code{mpz_}.
2980
2981GMP integers are stored in objects of type @code{mpz_t}.
2982
2983@menu
2984* Initializing Integers::
2985* Assigning Integers::
2986* Simultaneous Integer Init & Assign::
2987* Converting Integers::
2988* Integer Arithmetic::
2989* Integer Division::
2990* Integer Exponentiation::
2991* Integer Roots::
2992* Number Theoretic Functions::
2993* Integer Comparisons::
2994* Integer Logic and Bit Fiddling::
2995* I/O of Integers::
2996* Integer Random Numbers::
2997* Integer Import and Export::
2998* Miscellaneous Integer Functions::
2999* Integer Special Functions::
3000@end menu
3001
3002@node Initializing Integers, Assigning Integers, Integer Functions, Integer Functions
3003@comment node-name, next, previous, up
3004@section Initialization Functions
3005@cindex Integer initialization functions
3006@cindex Initialization functions
3007
3008The functions for integer arithmetic assume that all integer objects are
3009initialized. You do that by calling the function @code{mpz_init}. For
3010example,
3011
3012@example
3013@{
3014 mpz_t integ;
3015 mpz_init (integ);
3016 @dots{}
3017 mpz_add (integ, @dots{});
3018 @dots{}
3019 mpz_sub (integ, @dots{});
3020
3021 /* Unless the program is about to exit, do ... */
3022 mpz_clear (integ);
3023@}
3024@end example
3025
3026As you can see, you can store new values any number of times, once an
3027object is initialized.
3028
3029@deftypefun void mpz_init (mpz_t @var{x})
3030Initialize @var{x}, and set its value to 0.
3031@end deftypefun
3032
3033@deftypefun void mpz_inits (mpz_t @var{x}, ...)
3034Initialize a NULL-terminated list of @code{mpz_t} variables, and set their
3035values to 0.
3036@end deftypefun
3037
3038@deftypefun void mpz_init2 (mpz_t @var{x}, mp_bitcnt_t @var{n})
3039Initialize @var{x}, with space for @var{n}-bit numbers, and set its value to 0.
3040Calling this function instead of @code{mpz_init} or @code{mpz_inits} is never
3041necessary; reallocation is handled automatically by GMP when needed.
3042
3043While @var{n} defines the initial space, @var{x} will grow automatically in the
3044normal way, if necessary, for subsequent values stored. @code{mpz_init2} makes
3045it possible to avoid such reallocations if a maximum size is known in advance.
3046
3047In preparation for an operation, GMP often allocates one limb more than
3048ultimately needed. To make sure GMP will not perform reallocation for
3049@var{x}, you need to add the number of bits in @code{mp_limb_t} to @var{n}.
3050@end deftypefun
3051
3052@deftypefun void mpz_clear (mpz_t @var{x})
3053Free the space occupied by @var{x}. Call this function for all @code{mpz_t}
3054variables when you are done with them.
3055@end deftypefun
3056
3057@deftypefun void mpz_clears (mpz_t @var{x}, ...)
3058Free the space occupied by a NULL-terminated list of @code{mpz_t} variables.
3059@end deftypefun
3060
3061@deftypefun void mpz_realloc2 (mpz_t @var{x}, mp_bitcnt_t @var{n})
3062Change the space allocated for @var{x} to @var{n} bits. The value in @var{x}
3063is preserved if it fits, or is set to 0 if not.
3064
3065Calling this function is never necessary; reallocation is handled automatically
3066by GMP when needed. But this function can be used to increase the space for a
3067variable in order to avoid repeated automatic reallocations, or to decrease it
3068to give memory back to the heap.
3069@end deftypefun
3070
3071
3072@node Assigning Integers, Simultaneous Integer Init & Assign, Initializing Integers, Integer Functions
3073@comment node-name, next, previous, up
3074@section Assignment Functions
3075@cindex Integer assignment functions
3076@cindex Assignment functions
3077
3078These functions assign new values to already initialized integers
3079(@pxref{Initializing Integers}).
3080
3081@deftypefun void mpz_set (mpz_t @var{rop}, const mpz_t @var{op})
3082@deftypefunx void mpz_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
3083@deftypefunx void mpz_set_si (mpz_t @var{rop}, signed long int @var{op})
3084@deftypefunx void mpz_set_d (mpz_t @var{rop}, double @var{op})
3085@deftypefunx void mpz_set_q (mpz_t @var{rop}, const mpq_t @var{op})
3086@deftypefunx void mpz_set_f (mpz_t @var{rop}, const mpf_t @var{op})
3087Set the value of @var{rop} from @var{op}.
3088
3089@code{mpz_set_d}, @code{mpz_set_q} and @code{mpz_set_f} truncate @var{op} to
3090make it an integer.
3091@end deftypefun
3092
3093@deftypefun int mpz_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base})
3094Set the value of @var{rop} from @var{str}, a null-terminated C string in base
3095@var{base}. White space is allowed in the string, and is simply ignored.
3096
3097The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
3098characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
3099@code{0B} for binary, @code{0} for octal, or decimal otherwise.
3100
3101For bases up to 36, case is ignored; upper-case and lower-case letters have
3102the same value. For bases 37 to 62, upper-case letter represent the usual
310310..35 while lower-case letter represent 36..61.
3104
3105This function returns 0 if the entire string is a valid number in base
3106@var{base}. Otherwise it returns @minus{}1.
3107@c
3108@c It turns out that it is not entirely true that this function ignores
3109@c white-space. It does ignore it between digits, but not after a minus sign
3110@c or within or after ``0x''. Some thought was given to disallowing all
3111@c whitespace, but that would be an incompatible change, whitespace has been
3112@c documented as ignored ever since GMP 1.
3113@c
3114@end deftypefun
3115
3116@deftypefun void mpz_swap (mpz_t @var{rop1}, mpz_t @var{rop2})
3117Swap the values @var{rop1} and @var{rop2} efficiently.
3118@end deftypefun
3119
3120
3121@node Simultaneous Integer Init & Assign, Converting Integers, Assigning Integers, Integer Functions
3122@comment node-name, next, previous, up
3123@section Combined Initialization and Assignment Functions
3124@cindex Integer assignment functions
3125@cindex Assignment functions
3126@cindex Integer initialization functions
3127@cindex Initialization functions
3128
3129For convenience, GMP provides a parallel series of initialize-and-set functions
3130which initialize the output and then store the value there. These functions'
3131names have the form @code{mpz_init_set@dots{}}
3132
3133Here is an example of using one:
3134
3135@example
3136@{
3137 mpz_t pie;
3138 mpz_init_set_str (pie, "3141592653589793238462643383279502884", 10);
3139 @dots{}
3140 mpz_sub (pie, @dots{});
3141 @dots{}
3142 mpz_clear (pie);
3143@}
3144@end example
3145
3146@noindent
3147Once the integer has been initialized by any of the @code{mpz_init_set@dots{}}
3148functions, it can be used as the source or destination operand for the ordinary
3149integer functions. Don't use an initialize-and-set function on a variable
3150already initialized!
3151
3152@deftypefun void mpz_init_set (mpz_t @var{rop}, const mpz_t @var{op})
3153@deftypefunx void mpz_init_set_ui (mpz_t @var{rop}, unsigned long int @var{op})
3154@deftypefunx void mpz_init_set_si (mpz_t @var{rop}, signed long int @var{op})
3155@deftypefunx void mpz_init_set_d (mpz_t @var{rop}, double @var{op})
3156Initialize @var{rop} with limb space and set the initial numeric value from
3157@var{op}.
3158@end deftypefun
3159
3160@deftypefun int mpz_init_set_str (mpz_t @var{rop}, const char *@var{str}, int @var{base})
3161Initialize @var{rop} and set its value like @code{mpz_set_str} (see its
3162documentation above for details).
3163
3164If the string is a correct base @var{base} number, the function returns 0;
3165if an error occurs it returns @minus{}1. @var{rop} is initialized even if
3166an error occurs. (I.e., you have to call @code{mpz_clear} for it.)
3167@end deftypefun
3168
3169
3170@node Converting Integers, Integer Arithmetic, Simultaneous Integer Init & Assign, Integer Functions
3171@comment node-name, next, previous, up
3172@section Conversion Functions
3173@cindex Integer conversion functions
3174@cindex Conversion functions
3175
3176This section describes functions for converting GMP integers to standard C
3177types. Functions for converting @emph{to} GMP integers are described in
3178@ref{Assigning Integers} and @ref{I/O of Integers}.
3179
3180@deftypefun {unsigned long int} mpz_get_ui (const mpz_t @var{op})
3181Return the value of @var{op} as an @code{unsigned long}.
3182
3183If @var{op} is too big to fit an @code{unsigned long} then just the least
3184significant bits that do fit are returned. The sign of @var{op} is ignored,
3185only the absolute value is used.
3186@end deftypefun
3187
3188@deftypefun {signed long int} mpz_get_si (const mpz_t @var{op})
3189If @var{op} fits into a @code{signed long int} return the value of @var{op}.
3190Otherwise return the least significant part of @var{op}, with the same sign
3191as @var{op}.
3192
3193If @var{op} is too big to fit in a @code{signed long int}, the returned
3194result is probably not very useful. To find out if the value will fit, use
3195the function @code{mpz_fits_slong_p}.
3196@end deftypefun
3197
3198@deftypefun double mpz_get_d (const mpz_t @var{op})
3199Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
3200towards zero).
3201
3202If the exponent from the conversion is too big, the result is system
3203dependent. An infinity is returned where available. A hardware overflow trap
3204may or may not occur.
3205@end deftypefun
3206
3207@deftypefun double mpz_get_d_2exp (signed long int *@var{exp}, const mpz_t @var{op})
3208Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
3209towards zero), and returning the exponent separately.
3210
3211The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
3212exponent is stored to @code{*@var{exp}}. @m{@var{d} * 2^{exp}, @var{d} *
32132^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero, the
3214return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
3215
3216@cindex @code{frexp}
3217This is similar to the standard C @code{frexp} function (@pxref{Normalization
3218Functions,,, libc, The GNU C Library Reference Manual}).
3219@end deftypefun
3220
3221@deftypefun {char *} mpz_get_str (char *@var{str}, int @var{base}, const mpz_t @var{op})
3222Convert @var{op} to a string of digits in base @var{base}. The base argument
3223may vary from 2 to 62 or from @minus{}2 to @minus{}36.
3224
3225For @var{base} in the range 2..36, digits and lower-case letters are used; for
3226@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
3227digits, upper-case letters, and lower-case letters (in that significance order)
3228are used.
3229
3230If @var{str} is @code{NULL}, the result string is allocated using the current
3231allocation function (@pxref{Custom Allocation}). The block will be
3232@code{strlen(str)+1} bytes, that being exactly enough for the string and
3233null-terminator.
3234
3235If @var{str} is not @code{NULL}, it should point to a block of storage large
3236enough for the result, that being @code{mpz_sizeinbase (@var{op}, @var{base})
3237+ 2}. The two extra bytes are for a possible minus sign, and the
3238null-terminator.
3239
3240A pointer to the result string is returned, being either the allocated block,
3241or the given @var{str}.
3242@end deftypefun
3243
3244
3245@need 2000
3246@node Integer Arithmetic, Integer Division, Converting Integers, Integer Functions
3247@comment node-name, next, previous, up
3248@section Arithmetic Functions
3249@cindex Integer arithmetic functions
3250@cindex Arithmetic functions
3251
3252@deftypefun void mpz_add (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3253@deftypefunx void mpz_add_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3254Set @var{rop} to @math{@var{op1} + @var{op2}}.
3255@end deftypefun
3256
3257@deftypefun void mpz_sub (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3258@deftypefunx void mpz_sub_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3259@deftypefunx void mpz_ui_sub (mpz_t @var{rop}, unsigned long int @var{op1}, const mpz_t @var{op2})
3260Set @var{rop} to @var{op1} @minus{} @var{op2}.
3261@end deftypefun
3262
3263@deftypefun void mpz_mul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3264@deftypefunx void mpz_mul_si (mpz_t @var{rop}, const mpz_t @var{op1}, long int @var{op2})
3265@deftypefunx void mpz_mul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3266Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
3267@end deftypefun
3268
3269@deftypefun void mpz_addmul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3270@deftypefunx void mpz_addmul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3271Set @var{rop} to @math{@var{rop} + @var{op1} @GMPtimes{} @var{op2}}.
3272@end deftypefun
3273
3274@deftypefun void mpz_submul (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3275@deftypefunx void mpz_submul_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3276Set @var{rop} to @math{@var{rop} - @var{op1} @GMPtimes{} @var{op2}}.
3277@end deftypefun
3278
3279@deftypefun void mpz_mul_2exp (mpz_t @var{rop}, const mpz_t @var{op1}, mp_bitcnt_t @var{op2})
3280@cindex Bit shift left
3281Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
3282@var{op2}}. This operation can also be defined as a left shift by @var{op2}
3283bits.
3284@end deftypefun
3285
3286@deftypefun void mpz_neg (mpz_t @var{rop}, const mpz_t @var{op})
3287Set @var{rop} to @minus{}@var{op}.
3288@end deftypefun
3289
3290@deftypefun void mpz_abs (mpz_t @var{rop}, const mpz_t @var{op})
3291Set @var{rop} to the absolute value of @var{op}.
3292@end deftypefun
3293
3294
3295@need 2000
3296@node Integer Division, Integer Exponentiation, Integer Arithmetic, Integer Functions
3297@section Division Functions
3298@cindex Integer division functions
3299@cindex Division functions
3300
3301Division is undefined if the divisor is zero. Passing a zero divisor to the
3302division or modulo functions (including the modular powering functions
3303@code{mpz_powm} and @code{mpz_powm_ui}), will cause an intentional division by
3304zero. This lets a program handle arithmetic exceptions in these functions the
3305same way as for normal C @code{int} arithmetic.
3306
3307@c Separate deftypefun groups for cdiv, fdiv and tdiv produce a blank line
3308@c between each, and seem to let tex do a better job of page breaks than an
3309@c @sp 1 in the middle of one big set.
3310
3311@deftypefun void mpz_cdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
3312@deftypefunx void mpz_cdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3313@deftypefunx void mpz_cdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3314@maybepagebreak
3315@deftypefunx {unsigned long int} mpz_cdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3316@deftypefunx {unsigned long int} mpz_cdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3317@deftypefunx {unsigned long int} mpz_cdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
3318@deftypefunx {unsigned long int} mpz_cdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
3319@maybepagebreak
3320@deftypefunx void mpz_cdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3321@deftypefunx void mpz_cdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3322@end deftypefun
3323
3324@deftypefun void mpz_fdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
3325@deftypefunx void mpz_fdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3326@deftypefunx void mpz_fdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3327@maybepagebreak
3328@deftypefunx {unsigned long int} mpz_fdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3329@deftypefunx {unsigned long int} mpz_fdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3330@deftypefunx {unsigned long int} mpz_fdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
3331@deftypefunx {unsigned long int} mpz_fdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
3332@maybepagebreak
3333@deftypefunx void mpz_fdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3334@deftypefunx void mpz_fdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3335@end deftypefun
3336
3337@deftypefun void mpz_tdiv_q (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
3338@deftypefunx void mpz_tdiv_r (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3339@deftypefunx void mpz_tdiv_qr (mpz_t @var{q}, mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3340@maybepagebreak
3341@deftypefunx {unsigned long int} mpz_tdiv_q_ui (mpz_t @var{q}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3342@deftypefunx {unsigned long int} mpz_tdiv_r_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3343@deftypefunx {unsigned long int} mpz_tdiv_qr_ui (mpz_t @var{q}, mpz_t @var{r}, @w{const mpz_t @var{n}}, @w{unsigned long int @var{d}})
3344@deftypefunx {unsigned long int} mpz_tdiv_ui (const mpz_t @var{n}, @w{unsigned long int @var{d}})
3345@maybepagebreak
3346@deftypefunx void mpz_tdiv_q_2exp (mpz_t @var{q}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3347@deftypefunx void mpz_tdiv_r_2exp (mpz_t @var{r}, const mpz_t @var{n}, @w{mp_bitcnt_t @var{b}})
3348@cindex Bit shift right
3349
3350@sp 1
3351Divide @var{n} by @var{d}, forming a quotient @var{q} and/or remainder
3352@var{r}. For the @code{2exp} functions, @m{@var{d}=2^b, @var{d}=2^@var{b}}.
3353The rounding is in three styles, each suiting different applications.
3354
3355@itemize @bullet
3356@item
3357@code{cdiv} rounds @var{q} up towards @m{+\infty, +infinity}, and @var{r} will
3358have the opposite sign to @var{d}. The @code{c} stands for ``ceil''.
3359
3360@item
3361@code{fdiv} rounds @var{q} down towards @m{-\infty, @minus{}infinity}, and
3362@var{r} will have the same sign as @var{d}. The @code{f} stands for
3363``floor''.
3364
3365@item
3366@code{tdiv} rounds @var{q} towards zero, and @var{r} will have the same sign
3367as @var{n}. The @code{t} stands for ``truncate''.
3368@end itemize
3369
3370In all cases @var{q} and @var{r} will satisfy
3371@m{@var{n}=@var{q}@var{d}+@var{r}, @var{n}=@var{q}*@var{d}+@var{r}}, and
3372@var{r} will satisfy @math{0@le{}@GMPabs{@var{r}}<@GMPabs{@var{d}}}.
3373
3374The @code{q} functions calculate only the quotient, the @code{r} functions
3375only the remainder, and the @code{qr} functions calculate both. Note that for
3376@code{qr} the same variable cannot be passed for both @var{q} and @var{r}, or
3377results will be unpredictable.
3378
3379For the @code{ui} variants the return value is the remainder, and in fact
3380returning the remainder is all the @code{div_ui} functions do. For
3381@code{tdiv} and @code{cdiv} the remainder can be negative, so for those the
3382return value is the absolute value of the remainder.
3383
3384For the @code{2exp} variants the divisor is @m{2^b,2^@var{b}}. These
3385functions are implemented as right shifts and bit masks, but of course they
3386round the same as the other functions.
3387
3388For positive @var{n} both @code{mpz_fdiv_q_2exp} and @code{mpz_tdiv_q_2exp}
3389are simple bitwise right shifts. For negative @var{n}, @code{mpz_fdiv_q_2exp}
3390is effectively an arithmetic right shift treating @var{n} as twos complement
3391the same as the bitwise logical functions do, whereas @code{mpz_tdiv_q_2exp}
3392effectively treats @var{n} as sign and magnitude.
3393@end deftypefun
3394
3395@deftypefun void mpz_mod (mpz_t @var{r}, const mpz_t @var{n}, const mpz_t @var{d})
3396@deftypefunx {unsigned long int} mpz_mod_ui (mpz_t @var{r}, const mpz_t @var{n}, @w{unsigned long int @var{d}})
3397Set @var{r} to @var{n} @code{mod} @var{d}. The sign of the divisor is
3398ignored; the result is always non-negative.
3399
3400@code{mpz_mod_ui} is identical to @code{mpz_fdiv_r_ui} above, returning the
3401remainder as well as setting @var{r}. See @code{mpz_fdiv_ui} above if only
3402the return value is wanted.
3403@end deftypefun
3404
3405@deftypefun void mpz_divexact (mpz_t @var{q}, const mpz_t @var{n}, const mpz_t @var{d})
3406@deftypefunx void mpz_divexact_ui (mpz_t @var{q}, const mpz_t @var{n}, unsigned long @var{d})
3407@cindex Exact division functions
3408Set @var{q} to @var{n}/@var{d}. These functions produce correct results only
3409when it is known in advance that @var{d} divides @var{n}.
3410
3411These routines are much faster than the other division functions, and are the
3412best choice when exact division is known to occur, for example reducing a
3413rational to lowest terms.
3414@end deftypefun
3415
3416@deftypefun int mpz_divisible_p (const mpz_t @var{n}, const mpz_t @var{d})
3417@deftypefunx int mpz_divisible_ui_p (const mpz_t @var{n}, unsigned long int @var{d})
3418@deftypefunx int mpz_divisible_2exp_p (const mpz_t @var{n}, mp_bitcnt_t @var{b})
3419@cindex Divisibility functions
3420Return non-zero if @var{n} is exactly divisible by @var{d}, or in the case of
3421@code{mpz_divisible_2exp_p} by @m{2^b,2^@var{b}}.
3422
3423@var{n} is divisible by @var{d} if there exists an integer @var{q} satisfying
3424@math{@var{n} = @var{q}@GMPmultiply{}@var{d}}. Unlike the other division
3425functions, @math{@var{d}=0} is accepted and following the rule it can be seen
3426that only 0 is considered divisible by 0.
3427@end deftypefun
3428
3429@deftypefun int mpz_congruent_p (const mpz_t @var{n}, const mpz_t @var{c}, const mpz_t @var{d})
3430@deftypefunx int mpz_congruent_ui_p (const mpz_t @var{n}, unsigned long int @var{c}, unsigned long int @var{d})
3431@deftypefunx int mpz_congruent_2exp_p (const mpz_t @var{n}, const mpz_t @var{c}, mp_bitcnt_t @var{b})
3432@cindex Divisibility functions
3433@cindex Congruence functions
3434Return non-zero if @var{n} is congruent to @var{c} modulo @var{d}, or in the
3435case of @code{mpz_congruent_2exp_p} modulo @m{2^b,2^@var{b}}.
3436
3437@var{n} is congruent to @var{c} mod @var{d} if there exists an integer @var{q}
3438satisfying @math{@var{n} = @var{c} + @var{q}@GMPmultiply{}@var{d}}. Unlike
3439the other division functions, @math{@var{d}=0} is accepted and following the
3440rule it can be seen that @var{n} and @var{c} are considered congruent mod 0
3441only when exactly equal.
3442@end deftypefun
3443
3444
3445@need 2000
3446@node Integer Exponentiation, Integer Roots, Integer Division, Integer Functions
3447@section Exponentiation Functions
3448@cindex Integer exponentiation functions
3449@cindex Exponentiation functions
3450@cindex Powering functions
3451
3452@deftypefun void mpz_powm (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod})
3453@deftypefunx void mpz_powm_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp}, const mpz_t @var{mod})
3454Set @var{rop} to @m{base^{exp} \bmod mod, (@var{base} raised to @var{exp})
3455modulo @var{mod}}.
3456
3457Negative @var{exp} is supported if the inverse @mm{@var{base}@sup{-1} @bmod
3458@var{mod}, @var{base}^(-1) @bmod @var{mod}} exists (see @code{mpz_invert} in
3459@ref{Number Theoretic Functions}). If an inverse doesn't exist then a divide
3460by zero is raised.
3461@end deftypefun
3462
3463@deftypefun void mpz_powm_sec (mpz_t @var{rop}, const mpz_t @var{base}, const mpz_t @var{exp}, const mpz_t @var{mod})
3464Set @var{rop} to @m{base^{exp} \bmod @var{mod}, (@var{base} raised to @var{exp})
3465modulo @var{mod}}.
3466
3467It is required that @math{@var{exp} > 0} and that @var{mod} is odd.
3468
3469This function is designed to take the same time and have the same cache access
3470patterns for any two same-size arguments, assuming that function arguments are
3471placed at the same position and that the machine state is identical upon
3472function entry. This function is intended for cryptographic purposes, where
3473resilience to side-channel attacks is desired.
3474@end deftypefun
3475
3476@deftypefun void mpz_pow_ui (mpz_t @var{rop}, const mpz_t @var{base}, unsigned long int @var{exp})
3477@deftypefunx void mpz_ui_pow_ui (mpz_t @var{rop}, unsigned long int @var{base}, unsigned long int @var{exp})
3478Set @var{rop} to @m{base^{exp}, @var{base} raised to @var{exp}}. The case
3479@math{0^0} yields 1.
3480@end deftypefun
3481
3482
3483@need 2000
3484@node Integer Roots, Number Theoretic Functions, Integer Exponentiation, Integer Functions
3485@section Root Extraction Functions
3486@cindex Integer root functions
3487@cindex Root extraction functions
3488
3489@deftypefun int mpz_root (mpz_t @var{rop}, const mpz_t @var{op}, unsigned long int @var{n})
3490Set @var{rop} to @m{\lfloor\root n \of {op}\rfloor@C{},} the truncated integer
3491part of the @var{n}th root of @var{op}. Return non-zero if the computation
3492was exact, i.e., if @var{op} is @var{rop} to the @var{n}th power.
3493@end deftypefun
3494
3495@deftypefun void mpz_rootrem (mpz_t @var{root}, mpz_t @var{rem}, const mpz_t @var{u}, unsigned long int @var{n})
3496Set @var{root} to @m{\lfloor\root n \of {u}\rfloor@C{},} the truncated
3497integer part of the @var{n}th root of @var{u}. Set @var{rem} to the
3498remainder, @m{(@var{u} - @var{root}^n),
3499@var{u}@minus{}@var{root}**@var{n}}.
3500@end deftypefun
3501
3502@deftypefun void mpz_sqrt (mpz_t @var{rop}, const mpz_t @var{op})
3503Set @var{rop} to @m{\lfloor\sqrt{@var{op}}\rfloor@C{},} the truncated
3504integer part of the square root of @var{op}.
3505@end deftypefun
3506
3507@deftypefun void mpz_sqrtrem (mpz_t @var{rop1}, mpz_t @var{rop2}, const mpz_t @var{op})
3508Set @var{rop1} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part
3509of the square root of @var{op}}, like @code{mpz_sqrt}. Set @var{rop2} to the
3510remainder @m{(@var{op} - @var{rop1}^2),
3511@var{op}@minus{}@var{rop1}*@var{rop1}}, which will be zero if @var{op} is a
3512perfect square.
3513
3514If @var{rop1} and @var{rop2} are the same variable, the results are
3515undefined.
3516@end deftypefun
3517
3518@deftypefun int mpz_perfect_power_p (const mpz_t @var{op})
3519@cindex Perfect power functions
3520@cindex Root testing functions
3521Return non-zero if @var{op} is a perfect power, i.e., if there exist integers
3522@m{a,@var{a}} and @m{b,@var{b}}, with @m{b>1, @var{b}>1}, such that
3523@m{@var{op}=a^b, @var{op} equals @var{a} raised to the power @var{b}}.
3524
3525Under this definition both 0 and 1 are considered to be perfect powers.
3526Negative values of @var{op} are accepted, but of course can only be odd
3527perfect powers.
3528@end deftypefun
3529
3530@deftypefun int mpz_perfect_square_p (const mpz_t @var{op})
3531@cindex Perfect square functions
3532@cindex Root testing functions
3533Return non-zero if @var{op} is a perfect square, i.e., if the square root of
3534@var{op} is an integer. Under this definition both 0 and 1 are considered to
3535be perfect squares.
3536@end deftypefun
3537
3538
3539@need 2000
3540@node Number Theoretic Functions, Integer Comparisons, Integer Roots, Integer Functions
3541@section Number Theoretic Functions
3542@cindex Number theoretic functions
3543
3544@deftypefun int mpz_probab_prime_p (const mpz_t @var{n}, int @var{reps})
3545@cindex Prime testing functions
3546@cindex Probable prime testing functions
3547Determine whether @var{n} is prime. Return 2 if @var{n} is definitely prime,
3548return 1 if @var{n} is probably prime (without being certain), or return 0 if
3549@var{n} is definitely non-prime.
3550
3551This function performs some trial divisions, a Baillie-PSW probable prime
3552test, then @var{reps-24} Miller-Rabin probabilistic primality tests. A
3553higher @var{reps} value will reduce the chances of a non-prime being
3554identified as ``probably prime''. A composite number will be identified as a
3555prime with an asymptotic probability of less than @m{4^{-reps},4^(-@var{reps})}.
3556Reasonable values of @var{reps} are between 15 and 50.
3557
3558GMP versions up to and including 6.1.2 did not use the Baillie-PSW
3559primality test. In those older versions of GMP, this function performed
3560@var{reps} Miller-Rabin tests.
3561@end deftypefun
3562
3563@deftypefun void mpz_nextprime (mpz_t @var{rop}, const mpz_t @var{op})
3564@cindex Next prime function
3565Set @var{rop} to the next prime greater than @var{op}.
3566
3567This function uses a probabilistic algorithm to identify primes. For
3568practical purposes it's adequate, the chance of a composite passing will be
3569extremely small.
3570@end deftypefun
3571
3572@c mpz_prime_p not implemented as of gmp 3.0.
3573
3574@c @deftypefun int mpz_prime_p (const mpz_t @var{n})
3575@c Return non-zero if @var{n} is prime and zero if @var{n} is a non-prime.
3576@c This function is far slower than @code{mpz_probab_prime_p}, but then it
3577@c never returns non-zero for composite numbers.
3578
3579@c (For practical purposes, using @code{mpz_probab_prime_p} is adequate.
3580@c The likelihood of a programming error or hardware malfunction is orders
3581@c of magnitudes greater than the likelihood for a composite to pass as a
3582@c prime, if the @var{reps} argument is in the suggested range.)
3583@c @end deftypefun
3584
3585@deftypefun void mpz_gcd (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3586@cindex Greatest common divisor functions
3587@cindex GCD functions
3588Set @var{rop} to the greatest common divisor of @var{op1} and @var{op2}. The
3589result is always positive even if one or both input operands are negative.
3590Except if both inputs are zero; then this function defines @math{gcd(0,0) = 0}.
3591@end deftypefun
3592
3593@deftypefun {unsigned long int} mpz_gcd_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long int @var{op2})
3594Compute the greatest common divisor of @var{op1} and @var{op2}. If
3595@var{rop} is not @code{NULL}, store the result there.
3596
3597If the result is small enough to fit in an @code{unsigned long int}, it is
3598returned. If the result does not fit, 0 is returned, and the result is equal
3599to the argument @var{op1}. Note that the result will always fit if @var{op2}
3600is non-zero.
3601@end deftypefun
3602
3603@deftypefun void mpz_gcdext (mpz_t @var{g}, mpz_t @var{s}, mpz_t @var{t}, const mpz_t @var{a}, const mpz_t @var{b})
3604@cindex Extended GCD
3605@cindex GCD extended
3606Set @var{g} to the greatest common divisor of @var{a} and @var{b}, and in
3607addition set @var{s} and @var{t} to coefficients satisfying
3608@math{@var{a}@GMPmultiply{}@var{s} + @var{b}@GMPmultiply{}@var{t} = @var{g}}.
3609The value in @var{g} is always positive, even if one or both of @var{a} and
3610@var{b} are negative (or zero if both inputs are zero). The values in @var{s}
3611and @var{t} are chosen such that normally, @math{@GMPabs{@var{s}} <
3612@GMPabs{@var{b}} / (2 @var{g})} and @math{@GMPabs{@var{t}} < @GMPabs{@var{a}}
3613/ (2 @var{g})}, and these relations define @var{s} and @var{t} uniquely. There
3614are a few exceptional cases:
3615
3616If @math{@GMPabs{@var{a}} = @GMPabs{@var{b}}}, then @math{@var{s} = 0},
3617@math{@var{t} = sgn(@var{b})}.
3618
3619Otherwise, @math{@var{s} = sgn(@var{a})} if @math{@var{b} = 0} or
3620@math{@GMPabs{@var{b}} = 2 @var{g}}, and @math{@var{t} = sgn(@var{b})} if
3621@math{@var{a} = 0} or @math{@GMPabs{@var{a}} = 2 @var{g}}.
3622
3623In all cases, @math{@var{s} = 0} if and only if @math{@var{g} =
3624@GMPabs{@var{b}}}, i.e., if @var{b} divides @var{a} or @math{@var{a} = @var{b}
3625= 0}.
3626
3627If @var{t} or @var{g} is @code{NULL} then that value is not computed.
3628@end deftypefun
3629
3630@deftypefun void mpz_lcm (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3631@deftypefunx void mpz_lcm_ui (mpz_t @var{rop}, const mpz_t @var{op1}, unsigned long @var{op2})
3632@cindex Least common multiple functions
3633@cindex LCM functions
3634Set @var{rop} to the least common multiple of @var{op1} and @var{op2}.
3635@var{rop} is always positive, irrespective of the signs of @var{op1} and
3636@var{op2}. @var{rop} will be zero if either @var{op1} or @var{op2} is zero.
3637@end deftypefun
3638
3639@deftypefun int mpz_invert (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3640@cindex Modular inverse functions
3641@cindex Inverse modulo functions
3642Compute the inverse of @var{op1} modulo @var{op2} and put the result in
3643@var{rop}. If the inverse exists, the return value is non-zero and @var{rop}
3644will satisfy @math{0 @le{} @var{rop} < @GMPabs{@var{op2}}} (with @math{@var{rop}
3645= 0} possible only when @math{@GMPabs{@var{op2}} = 1}, i.e., in the
3646somewhat degenerate zero ring). If an inverse doesn't
3647exist the return value is zero and @var{rop} is undefined. The behaviour of
3648this function is undefined when @var{op2} is zero.
3649@end deftypefun
3650
3651@deftypefun int mpz_jacobi (const mpz_t @var{a}, const mpz_t @var{b})
3652@cindex Jacobi symbol functions
3653Calculate the Jacobi symbol @m{\left(a \over b\right),
3654(@var{a}/@var{b})}. This is defined only for @var{b} odd.
3655@end deftypefun
3656
3657@deftypefun int mpz_legendre (const mpz_t @var{a}, const mpz_t @var{p})
3658@cindex Legendre symbol functions
3659Calculate the Legendre symbol @m{\left(a \over p\right),
3660(@var{a}/@var{p})}. This is defined only for @var{p} an odd positive
3661prime, and for such @var{p} it's identical to the Jacobi symbol.
3662@end deftypefun
3663
3664@deftypefun int mpz_kronecker (const mpz_t @var{a}, const mpz_t @var{b})
3665@deftypefunx int mpz_kronecker_si (const mpz_t @var{a}, long @var{b})
3666@deftypefunx int mpz_kronecker_ui (const mpz_t @var{a}, unsigned long @var{b})
3667@deftypefunx int mpz_si_kronecker (long @var{a}, const mpz_t @var{b})
3668@deftypefunx int mpz_ui_kronecker (unsigned long @var{a}, const mpz_t @var{b})
3669@cindex Kronecker symbol functions
3670Calculate the Jacobi symbol @m{\left(a \over b\right),
3671(@var{a}/@var{b})} with the Kronecker extension @m{\left(a \over
36722\right) = \left(2 \over a\right), (a/2)=(2/a)} when @math{a} odd, or
3673@m{\left(a \over 2\right) = 0, (a/2)=0} when @math{a} even.
3674
3675When @var{b} is odd the Jacobi symbol and Kronecker symbol are
3676identical, so @code{mpz_kronecker_ui} etc can be used for mixed
3677precision Jacobi symbols too.
3678
3679For more information see Henri Cohen section 1.4.2 (@pxref{References}),
3680or any number theory textbook. See also the example program
3681@file{demos/qcn.c} which uses @code{mpz_kronecker_ui}.
3682@end deftypefun
3683
3684@deftypefun {mp_bitcnt_t} mpz_remove (mpz_t @var{rop}, const mpz_t @var{op}, const mpz_t @var{f})
3685@cindex Remove factor functions
3686@cindex Factor removal functions
3687Remove all occurrences of the factor @var{f} from @var{op} and store the
3688result in @var{rop}. The return value is how many such occurrences were
3689removed.
3690@end deftypefun
3691
3692@deftypefun void mpz_fac_ui (mpz_t @var{rop}, unsigned long int @var{n})
3693@deftypefunx void mpz_2fac_ui (mpz_t @var{rop}, unsigned long int @var{n})
3694@deftypefunx void mpz_mfac_uiui (mpz_t @var{rop}, unsigned long int @var{n}, unsigned long int @var{m})
3695@cindex Factorial functions
3696Set @var{rop} to the factorial of @var{n}: @code{mpz_fac_ui} computes the plain factorial @var{n}!,
3697@code{mpz_2fac_ui} computes the double-factorial @var{n}!!, and @code{mpz_mfac_uiui} the
3698@var{m}-multi-factorial @m{n!^{(m)}, @var{n}!^(@var{m})}.
3699@end deftypefun
3700
3701@deftypefun void mpz_primorial_ui (mpz_t @var{rop}, unsigned long int @var{n})
3702@cindex Primorial functions
3703Set @var{rop} to the primorial of @var{n}, i.e. the product of all positive
3704prime numbers @math{@le{}@var{n}}.
3705@end deftypefun
3706
3707@deftypefun void mpz_bin_ui (mpz_t @var{rop}, const mpz_t @var{n}, unsigned long int @var{k})
3708@deftypefunx void mpz_bin_uiui (mpz_t @var{rop}, unsigned long int @var{n}, @w{unsigned long int @var{k}})
3709@cindex Binomial coefficient functions
3710Compute the binomial coefficient @m{\left({n}\atop{k}\right), @var{n} over
3711@var{k}} and store the result in @var{rop}. Negative values of @var{n} are
3712supported by @code{mpz_bin_ui}, using the identity
3713@m{\left({-n}\atop{k}\right) = (-1)^k \left({n+k-1}\atop{k}\right),
3714bin(-n@C{}k) = (-1)^k * bin(n+k-1@C{}k)}, see Knuth volume 1 section 1.2.6
3715part G.
3716@end deftypefun
3717
3718@deftypefun void mpz_fib_ui (mpz_t @var{fn}, unsigned long int @var{n})
3719@deftypefunx void mpz_fib2_ui (mpz_t @var{fn}, mpz_t @var{fnsub1}, unsigned long int @var{n})
3720@cindex Fibonacci sequence functions
3721@code{mpz_fib_ui} sets @var{fn} to to @m{F_n,F[n]}, the @var{n}'th Fibonacci
3722number. @code{mpz_fib2_ui} sets @var{fn} to @m{F_n,F[n]}, and @var{fnsub1} to
3723@m{F_{n-1},F[n-1]}.
3724
3725These functions are designed for calculating isolated Fibonacci numbers. When
3726a sequence of values is wanted it's best to start with @code{mpz_fib2_ui} and
3727iterate the defining @m{F_{n+1} = F_n + F_{n-1}, F[n+1]=F[n]+F[n-1]} or
3728similar.
3729@end deftypefun
3730
3731@deftypefun void mpz_lucnum_ui (mpz_t @var{ln}, unsigned long int @var{n})
3732@deftypefunx void mpz_lucnum2_ui (mpz_t @var{ln}, mpz_t @var{lnsub1}, unsigned long int @var{n})
3733@cindex Lucas number functions
3734@code{mpz_lucnum_ui} sets @var{ln} to to @m{L_n,L[n]}, the @var{n}'th Lucas
3735number. @code{mpz_lucnum2_ui} sets @var{ln} to @m{L_n,L[n]}, and @var{lnsub1}
3736to @m{L_{n-1},L[n-1]}.
3737
3738These functions are designed for calculating isolated Lucas numbers. When a
3739sequence of values is wanted it's best to start with @code{mpz_lucnum2_ui} and
3740iterate the defining @m{L_{n+1} = L_n + L_{n-1}, L[n+1]=L[n]+L[n-1]} or
3741similar.
3742
3743The Fibonacci numbers and Lucas numbers are related sequences, so it's never
3744necessary to call both @code{mpz_fib2_ui} and @code{mpz_lucnum2_ui}. The
3745formulas for going from Fibonacci to Lucas can be found in @ref{Lucas Numbers
3746Algorithm}, the reverse is straightforward too.
3747@end deftypefun
3748
3749
3750@node Integer Comparisons, Integer Logic and Bit Fiddling, Number Theoretic Functions, Integer Functions
3751@comment node-name, next, previous, up
3752@section Comparison Functions
3753@cindex Integer comparison functions
3754@cindex Comparison functions
3755
3756@deftypefn Function int mpz_cmp (const mpz_t @var{op1}, const mpz_t @var{op2})
3757@deftypefnx Function int mpz_cmp_d (const mpz_t @var{op1}, double @var{op2})
3758@deftypefnx Macro int mpz_cmp_si (const mpz_t @var{op1}, signed long int @var{op2})
3759@deftypefnx Macro int mpz_cmp_ui (const mpz_t @var{op1}, unsigned long int @var{op2})
3760Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
3761@var{op2}}, zero if @math{@var{op1} = @var{op2}}, or a negative value if
3762@math{@var{op1} < @var{op2}}.
3763
3764@code{mpz_cmp_ui} and @code{mpz_cmp_si} are macros and will evaluate their
3765arguments more than once. @code{mpz_cmp_d} can be called with an infinity,
3766but results are undefined for a NaN.
3767@end deftypefn
3768
3769@deftypefn Function int mpz_cmpabs (const mpz_t @var{op1}, const mpz_t @var{op2})
3770@deftypefnx Function int mpz_cmpabs_d (const mpz_t @var{op1}, double @var{op2})
3771@deftypefnx Function int mpz_cmpabs_ui (const mpz_t @var{op1}, unsigned long int @var{op2})
3772Compare the absolute values of @var{op1} and @var{op2}. Return a positive
3773value if @math{@GMPabs{@var{op1}} > @GMPabs{@var{op2}}}, zero if
3774@math{@GMPabs{@var{op1}} = @GMPabs{@var{op2}}}, or a negative value if
3775@math{@GMPabs{@var{op1}} < @GMPabs{@var{op2}}}.
3776
3777@code{mpz_cmpabs_d} can be called with an infinity, but results are undefined
3778for a NaN.
3779@end deftypefn
3780
3781@deftypefn Macro int mpz_sgn (const mpz_t @var{op})
3782@cindex Sign tests
3783@cindex Integer sign tests
3784Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
3785@math{-1} if @math{@var{op} < 0}.
3786
3787This function is actually implemented as a macro. It evaluates its argument
3788multiple times.
3789@end deftypefn
3790
3791
3792@node Integer Logic and Bit Fiddling, I/O of Integers, Integer Comparisons, Integer Functions
3793@comment node-name, next, previous, up
3794@section Logical and Bit Manipulation Functions
3795@cindex Logical functions
3796@cindex Bit manipulation functions
3797@cindex Integer logical functions
3798@cindex Integer bit manipulation functions
3799
3800These functions behave as if twos complement arithmetic were used (although
3801sign-magnitude is the actual implementation). The least significant bit is
3802number 0.
3803
3804@deftypefun void mpz_and (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3805Set @var{rop} to @var{op1} bitwise-and @var{op2}.
3806@end deftypefun
3807
3808@deftypefun void mpz_ior (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3809Set @var{rop} to @var{op1} bitwise inclusive-or @var{op2}.
3810@end deftypefun
3811
3812@deftypefun void mpz_xor (mpz_t @var{rop}, const mpz_t @var{op1}, const mpz_t @var{op2})
3813Set @var{rop} to @var{op1} bitwise exclusive-or @var{op2}.
3814@end deftypefun
3815
3816@deftypefun void mpz_com (mpz_t @var{rop}, const mpz_t @var{op})
3817Set @var{rop} to the one's complement of @var{op}.
3818@end deftypefun
3819
3820@deftypefun {mp_bitcnt_t} mpz_popcount (const mpz_t @var{op})
3821If @math{@var{op}@ge{}0}, return the population count of @var{op}, which is the
3822number of 1 bits in the binary representation. If @math{@var{op}<0}, the
3823number of 1s is infinite, and the return value is the largest possible
3824@code{mp_bitcnt_t}.
3825@end deftypefun
3826
3827@deftypefun {mp_bitcnt_t} mpz_hamdist (const mpz_t @var{op1}, const mpz_t @var{op2})
3828If @var{op1} and @var{op2} are both @math{@ge{}0} or both @math{<0}, return the
3829hamming distance between the two operands, which is the number of bit positions
3830where @var{op1} and @var{op2} have different bit values. If one operand is
3831@math{@ge{}0} and the other @math{<0} then the number of bits different is
3832infinite, and the return value is the largest possible @code{mp_bitcnt_t}.
3833@end deftypefun
3834
3835@deftypefun {mp_bitcnt_t} mpz_scan0 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit})
3836@deftypefunx {mp_bitcnt_t} mpz_scan1 (const mpz_t @var{op}, mp_bitcnt_t @var{starting_bit})
3837@cindex Bit scanning functions
3838@cindex Scan bit functions
3839Scan @var{op}, starting from bit @var{starting_bit}, towards more significant
3840bits, until the first 0 or 1 bit (respectively) is found. Return the index of
3841the found bit.
3842
3843If the bit at @var{starting_bit} is already what's sought, then
3844@var{starting_bit} is returned.
3845
3846If there's no bit found, then the largest possible @code{mp_bitcnt_t} is
3847returned. This will happen in @code{mpz_scan0} past the end of a negative
3848number, or @code{mpz_scan1} past the end of a nonnegative number.
3849@end deftypefun
3850
3851@deftypefun void mpz_setbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
3852Set bit @var{bit_index} in @var{rop}.
3853@end deftypefun
3854
3855@deftypefun void mpz_clrbit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
3856Clear bit @var{bit_index} in @var{rop}.
3857@end deftypefun
3858
3859@deftypefun void mpz_combit (mpz_t @var{rop}, mp_bitcnt_t @var{bit_index})
3860Complement bit @var{bit_index} in @var{rop}.
3861@end deftypefun
3862
3863@deftypefun int mpz_tstbit (const mpz_t @var{op}, mp_bitcnt_t @var{bit_index})
3864Test bit @var{bit_index} in @var{op} and return 0 or 1 accordingly.
3865@end deftypefun
3866
3867@node I/O of Integers, Integer Random Numbers, Integer Logic and Bit Fiddling, Integer Functions
3868@comment node-name, next, previous, up
3869@section Input and Output Functions
3870@cindex Integer input and output functions
3871@cindex Input functions
3872@cindex Output functions
3873@cindex I/O functions
3874
3875Functions that perform input from a stdio stream, and functions that output to
3876a stdio stream, of @code{mpz} numbers. Passing a @code{NULL} pointer for a
3877@var{stream} argument to any of these functions will make them read from
3878@code{stdin} and write to @code{stdout}, respectively.
3879
3880When using any of these functions, it is a good idea to include @file{stdio.h}
3881before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
3882for these functions.
3883
3884See also @ref{Formatted Output} and @ref{Formatted Input}.
3885
3886@deftypefun size_t mpz_out_str (FILE *@var{stream}, int @var{base}, const mpz_t @var{op})
3887Output @var{op} on stdio stream @var{stream}, as a string of digits in base
3888@var{base}. The base argument may vary from 2 to 62 or from @minus{}2 to
3889@minus{}36.
3890
3891For @var{base} in the range 2..36, digits and lower-case letters are used; for
3892@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
3893digits, upper-case letters, and lower-case letters (in that significance order)
3894are used.
3895
3896Return the number of bytes written, or if an error occurred, return 0.
3897@end deftypefun
3898
3899@deftypefun size_t mpz_inp_str (mpz_t @var{rop}, FILE *@var{stream}, int @var{base})
3900Input a possibly white-space preceded string in base @var{base} from stdio
3901stream @var{stream}, and put the read integer in @var{rop}.
3902
3903The @var{base} may vary from 2 to 62, or if @var{base} is 0, then the leading
3904characters are used: @code{0x} and @code{0X} for hexadecimal, @code{0b} and
3905@code{0B} for binary, @code{0} for octal, or decimal otherwise.
3906
3907For bases up to 36, case is ignored; upper-case and lower-case letters have
3908the same value. For bases 37 to 62, upper-case letter represent the usual
390910..35 while lower-case letter represent 36..61.
3910
3911Return the number of bytes read, or if an error occurred, return 0.
3912@end deftypefun
3913
3914@deftypefun size_t mpz_out_raw (FILE *@var{stream}, const mpz_t @var{op})
3915Output @var{op} on stdio stream @var{stream}, in raw binary format. The
3916integer is written in a portable format, with 4 bytes of size information, and
3917that many bytes of limbs. Both the size and the limbs are written in
3918decreasing significance order (i.e., in big-endian).
3919
3920The output can be read with @code{mpz_inp_raw}.
3921
3922Return the number of bytes written, or if an error occurred, return 0.
3923
3924The output of this can not be read by @code{mpz_inp_raw} from GMP 1, because
3925of changes necessary for compatibility between 32-bit and 64-bit machines.
3926@end deftypefun
3927
3928@deftypefun size_t mpz_inp_raw (mpz_t @var{rop}, FILE *@var{stream})
3929Input from stdio stream @var{stream} in the format written by
3930@code{mpz_out_raw}, and put the result in @var{rop}. Return the number of
3931bytes read, or if an error occurred, return 0.
3932
3933This routine can read the output from @code{mpz_out_raw} also from GMP 1, in
3934spite of changes necessary for compatibility between 32-bit and 64-bit
3935machines.
3936@end deftypefun
3937
3938
3939@need 2000
3940@node Integer Random Numbers, Integer Import and Export, I/O of Integers, Integer Functions
3941@comment node-name, next, previous, up
3942@section Random Number Functions
3943@cindex Integer random number functions
3944@cindex Random number functions
3945
3946The random number functions of GMP come in two groups; older function
3947that rely on a global state, and newer functions that accept a state
3948parameter that is read and modified. Please see the @ref{Random Number
3949Functions} for more information on how to use and not to use random
3950number functions.
3951
3952@deftypefun void mpz_urandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n})
3953Generate a uniformly distributed random integer in the range 0 to
3954@mm{2@sup{n}-1, 2^@var{n}@minus{}1}, inclusive.
3955
3956The variable @var{state} must be initialized by calling one of the
3957@code{gmp_randinit} functions (@ref{Random State Initialization}) before
3958invoking this function.
3959@end deftypefun
3960
3961@deftypefun void mpz_urandomm (mpz_t @var{rop}, gmp_randstate_t @var{state}, const mpz_t @var{n})
3962Generate a uniform random integer in the range 0 to @math{@var{n}-1},
3963inclusive.
3964
3965The variable @var{state} must be initialized by calling one of the
3966@code{gmp_randinit} functions (@ref{Random State Initialization})
3967before invoking this function.
3968@end deftypefun
3969
3970@deftypefun void mpz_rrandomb (mpz_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{n})
3971Generate a random integer with long strings of zeros and ones in the
3972binary representation. Useful for testing functions and algorithms,
3973since this kind of random numbers have proven to be more likely to
3974trigger corner-case bugs. The random number will be in the range
3975@mm{2@sup{n-1}, 2^(@var{n}@minus{}1)} to @mm{2@sup{n}-1,
39762^@var{n}@minus{}1}, inclusive.
3977
3978The variable @var{state} must be initialized by calling one of the
3979@code{gmp_randinit} functions (@ref{Random State Initialization})
3980before invoking this function.
3981@end deftypefun
3982
3983@deftypefun void mpz_random (mpz_t @var{rop}, mp_size_t @var{max_size})
3984Generate a random integer of at most @var{max_size} limbs. The generated
3985random number doesn't satisfy any particular requirements of randomness.
3986Negative random numbers are generated when @var{max_size} is negative.
3987
3988This function is obsolete. Use @code{mpz_urandomb} or
3989@code{mpz_urandomm} instead.
3990@end deftypefun
3991
3992@deftypefun void mpz_random2 (mpz_t @var{rop}, mp_size_t @var{max_size})
3993Generate a random integer of at most @var{max_size} limbs, with long strings
3994of zeros and ones in the binary representation. Useful for testing functions
3995and algorithms, since this kind of random numbers have proven to be more
3996likely to trigger corner-case bugs. Negative random numbers are generated
3997when @var{max_size} is negative.
3998
3999This function is obsolete. Use @code{mpz_rrandomb} instead.
4000@end deftypefun
4001
4002
4003@node Integer Import and Export, Miscellaneous Integer Functions, Integer Random Numbers, Integer Functions
4004@section Integer Import and Export
4005
4006@code{mpz_t} variables can be converted to and from arbitrary words of binary
4007data with the following functions.
4008
4009@deftypefun void mpz_import (mpz_t @var{rop}, size_t @var{count}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const void *@var{op})
4010@cindex Integer import
4011@cindex Import
4012Set @var{rop} from an array of word data at @var{op}.
4013
4014The parameters specify the format of the data. @var{count} many words are
4015read, each @var{size} bytes. @var{order} can be 1 for most significant word
4016first or -1 for least significant first. Within each word @var{endian} can be
40171 for most significant byte first, -1 for least significant first, or 0 for
4018the native endianness of the host CPU@. The most significant @var{nails} bits
4019of each word are skipped, this can be 0 to use the full words.
4020
4021There is no sign taken from the data, @var{rop} will simply be a positive
4022integer. An application can handle any sign itself, and apply it for instance
4023with @code{mpz_neg}.
4024
4025There are no data alignment restrictions on @var{op}, any address is allowed.
4026
4027Here's an example converting an array of @code{unsigned long} data, most
4028significant element first, and host byte order within each value.
4029
4030@example
4031unsigned long a[20];
4032/* Initialize @var{z} and @var{a} */
4033mpz_import (z, 20, 1, sizeof(a[0]), 0, 0, a);
4034@end example
4035
4036This example assumes the full @code{sizeof} bytes are used for data in the
4037given type, which is usually true, and certainly true for @code{unsigned long}
4038everywhere we know of. However on Cray vector systems it may be noted that
4039@code{short} and @code{int} are always stored in 8 bytes (and with
4040@code{sizeof} indicating that) but use only 32 or 46 bits. The @var{nails}
4041feature can account for this, by passing for instance
4042@code{8*sizeof(int)-INT_BIT}.
4043@end deftypefun
4044
4045@deftypefun {void *} mpz_export (void *@var{rop}, size_t *@var{countp}, int @var{order}, size_t @var{size}, int @var{endian}, size_t @var{nails}, const mpz_t @var{op})
4046@cindex Integer export
4047@cindex Export
4048Fill @var{rop} with word data from @var{op}.
4049
4050The parameters specify the format of the data produced. Each word will be
4051@var{size} bytes and @var{order} can be 1 for most significant word first or
4052-1 for least significant first. Within each word @var{endian} can be 1 for
4053most significant byte first, -1 for least significant first, or 0 for the
4054native endianness of the host CPU@. The most significant @var{nails} bits of
4055each word are unused and set to zero, this can be 0 to produce full words.
4056
4057The number of words produced is written to @code{*@var{countp}}, or
4058@var{countp} can be @code{NULL} to discard the count. @var{rop} must have
4059enough space for the data, or if @var{rop} is @code{NULL} then a result array
4060of the necessary size is allocated using the current GMP allocation function
4061(@pxref{Custom Allocation}). In either case the return value is the
4062destination used, either @var{rop} or the allocated block.
4063
4064If @var{op} is non-zero then the most significant word produced will be
4065non-zero. If @var{op} is zero then the count returned will be zero and
4066nothing written to @var{rop}. If @var{rop} is @code{NULL} in this case, no
4067block is allocated, just @code{NULL} is returned.
4068
4069The sign of @var{op} is ignored, just the absolute value is exported. An
4070application can use @code{mpz_sgn} to get the sign and handle it as desired.
4071(@pxref{Integer Comparisons})
4072
4073There are no data alignment restrictions on @var{rop}, any address is allowed.
4074
4075When an application is allocating space itself the required size can be
4076determined with a calculation like the following. Since @code{mpz_sizeinbase}
4077always returns at least 1, @code{count} here will be at least one, which
4078avoids any portability problems with @code{malloc(0)}, though if @code{z} is
4079zero no space at all is actually needed (or written).
4080
4081@example
4082numb = 8*size - nail;
4083count = (mpz_sizeinbase (z, 2) + numb-1) / numb;
4084p = malloc (count * size);
4085@end example
4086@end deftypefun
4087
4088
4089@need 2000
4090@node Miscellaneous Integer Functions, Integer Special Functions, Integer Import and Export, Integer Functions
4091@comment node-name, next, previous, up
4092@section Miscellaneous Functions
4093@cindex Miscellaneous integer functions
4094@cindex Integer miscellaneous functions
4095
4096@deftypefun int mpz_fits_ulong_p (const mpz_t @var{op})
4097@deftypefunx int mpz_fits_slong_p (const mpz_t @var{op})
4098@deftypefunx int mpz_fits_uint_p (const mpz_t @var{op})
4099@deftypefunx int mpz_fits_sint_p (const mpz_t @var{op})
4100@deftypefunx int mpz_fits_ushort_p (const mpz_t @var{op})
4101@deftypefunx int mpz_fits_sshort_p (const mpz_t @var{op})
4102Return non-zero iff the value of @var{op} fits in an @code{unsigned long int},
4103@code{signed long int}, @code{unsigned int}, @code{signed int}, @code{unsigned
4104short int}, or @code{signed short int}, respectively. Otherwise, return zero.
4105@end deftypefun
4106
4107@deftypefn Macro int mpz_odd_p (const mpz_t @var{op})
4108@deftypefnx Macro int mpz_even_p (const mpz_t @var{op})
4109Determine whether @var{op} is odd or even, respectively. Return non-zero if
4110yes, zero if no. These macros evaluate their argument more than once.
4111@end deftypefn
4112
4113@deftypefun size_t mpz_sizeinbase (const mpz_t @var{op}, int @var{base})
4114@cindex Size in digits
4115@cindex Digits in an integer
4116Return the size of @var{op} measured in number of digits in the given
4117@var{base}. @var{base} can vary from 2 to 62. The sign of @var{op} is
4118ignored, just the absolute value is used. The result will be either exact or
41191 too big. If @var{base} is a power of 2, the result is always exact. If
4120@var{op} is zero the return value is always 1.
4121
4122This function can be used to determine the space required when converting
4123@var{op} to a string. The right amount of allocation is normally two more
4124than the value returned by @code{mpz_sizeinbase}, one extra for a minus sign
4125and one for the null-terminator.
4126
4127@cindex Most significant bit
4128It will be noted that @code{mpz_sizeinbase(@var{op},2)} can be used to locate
4129the most significant 1 bit in @var{op}, counting from 1. (Unlike the bitwise
4130functions which start from 0, @xref{Integer Logic and Bit Fiddling,, Logical
4131and Bit Manipulation Functions}.)
4132@end deftypefun
4133
4134
4135@node Integer Special Functions, , Miscellaneous Integer Functions, Integer Functions
4136@section Special Functions
4137@cindex Special integer functions
4138@cindex Integer special functions
4139
4140The functions in this section are for various special purposes. Most
4141applications will not need them.
4142
4143@deftypefun void mpz_array_init (mpz_t @var{integer_array}, mp_size_t @var{array_size}, @w{mp_size_t @var{fixed_num_bits}})
4144@strong{This is an obsolete function. Do not use it.}
4145@end deftypefun
4146
4147@deftypefun {void *} _mpz_realloc (mpz_t @var{integer}, mp_size_t @var{new_alloc})
4148Change the space for @var{integer} to @var{new_alloc} limbs. The value in
4149@var{integer} is preserved if it fits, or is set to 0 if not. The return
4150value is not useful to applications and should be ignored.
4151
4152@code{mpz_realloc2} is the preferred way to accomplish allocation changes like
4153this. @code{mpz_realloc2} and @code{_mpz_realloc} are the same except that
4154@code{_mpz_realloc} takes its size in limbs.
4155@end deftypefun
4156
4157@deftypefun mp_limb_t mpz_getlimbn (const mpz_t @var{op}, mp_size_t @var{n})
4158Return limb number @var{n} from @var{op}. The sign of @var{op} is ignored,
4159just the absolute value is used. The least significant limb is number 0.
4160
4161@code{mpz_size} can be used to find how many limbs make up @var{op}.
4162@code{mpz_getlimbn} returns zero if @var{n} is outside the range 0 to
4163@code{mpz_size(@var{op})-1}.
4164@end deftypefun
4165
4166@deftypefun size_t mpz_size (const mpz_t @var{op})
4167Return the size of @var{op} measured in number of limbs. If @var{op} is zero,
4168the returned value will be zero.
4169@c (@xref{Nomenclature}, for an explanation of the concept @dfn{limb}.)
4170@end deftypefun
4171
4172@deftypefun {const mp_limb_t *} mpz_limbs_read (const mpz_t @var{x})
4173Return a pointer to the limb array representing the absolute value of @var{x}.
4174The size of the array is @code{mpz_size(@var{x})}. Intended for read access
4175only.
4176@end deftypefun
4177
4178@deftypefun {mp_limb_t *} mpz_limbs_write (mpz_t @var{x}, mp_size_t @var{n})
4179@deftypefunx {mp_limb_t *} mpz_limbs_modify (mpz_t @var{x}, mp_size_t @var{n})
4180Return a pointer to the limb array, intended for write access. The array is
4181reallocated as needed, to make room for @var{n} limbs. Requires @math{@var{n}
4182> 0}. The @code{mpz_limbs_modify} function returns an array that holds the old
4183absolute value of @var{x}, while @code{mpz_limbs_write} may destroy the old
4184value and return an array with unspecified contents.
4185@end deftypefun
4186
4187@deftypefun void mpz_limbs_finish (mpz_t @var{x}, mp_size_t @var{s})
4188Updates the internal size field of @var{x}. Used after writing to the limb
4189array pointer returned by @code{mpz_limbs_write} or @code{mpz_limbs_modify} is
4190completed. The array should contain @math{@GMPabs{@var{s}}} valid limbs,
4191representing the new absolute value for @var{x}, and the sign of @var{x} is
4192taken from the sign of @var{s}. This function never reallocates @var{x}, so
4193the limb pointer remains valid.
4194@end deftypefun
4195
4196@c FIXME: Some more useful and less silly example?
4197@example
4198void foo (mpz_t x)
4199@{
4200 mp_size_t n, i;
4201 mp_limb_t *xp;
4202
4203 n = mpz_size (x);
4204 xp = mpz_limbs_modify (x, 2*n);
4205 for (i = 0; i < n; i++)
4206 xp[n+i] = xp[n-1-i];
4207 mpz_limbs_finish (x, mpz_sgn (x) < 0 ? - 2*n : 2*n);
4208@}
4209@end example
4210
4211@deftypefun mpz_srcptr mpz_roinit_n (mpz_t @var{x}, const mp_limb_t *@var{xp}, mp_size_t @var{xs})
4212Special initialization of @var{x}, using the given limb array and size.
4213@var{x} should be treated as read-only: it can be passed safely as input to
4214any mpz function, but not as an output. The array @var{xp} must point to at
4215least a readable limb, its size is
4216@math{@GMPabs{@var{xs}}}, and the sign of @var{x} is the sign of @var{xs}. For
4217convenience, the function returns @var{x}, but cast to a const pointer type.
4218@end deftypefun
4219
4220@example
4221void foo (mpz_t x)
4222@{
4223 static const mp_limb_t y[3] = @{ 0x1, 0x2, 0x3 @};
4224 mpz_t tmp;
4225 mpz_add (x, x, mpz_roinit_n (tmp, y, 3));
4226@}
4227@end example
4228
4229@deftypefn Macro mpz_t MPZ_ROINIT_N (mp_limb_t *@var{xp}, mp_size_t @var{xs})
4230This macro expands to an initializer which can be assigned to an mpz_t
4231variable. The limb array @var{xp} must point to at least a readable limb,
4232moreover, unlike the @code{mpz_roinit_n} function, the array must be
4233normalized: if @var{xs} is non-zero, then
4234@code{@var{xp}[@math{@GMPabs{@var{xs}}-1}]} must be non-zero. Intended
4235primarily for constant values. Using it for non-constant values requires a C
4236compiler supporting C99.
4237@end deftypefn
4238
4239@example
4240void foo (mpz_t x)
4241@{
4242 static const mp_limb_t ya[3] = @{ 0x1, 0x2, 0x3 @};
4243 static const mpz_t y = MPZ_ROINIT_N ((mp_limb_t *) ya, 3);
4244
4245 mpz_add (x, x, y);
4246@}
4247@end example
4248
4249
4250@node Rational Number Functions, Floating-point Functions, Integer Functions, Top
4251@comment node-name, next, previous, up
4252@chapter Rational Number Functions
4253@cindex Rational number functions
4254
4255This chapter describes the GMP functions for performing arithmetic on rational
4256numbers. These functions start with the prefix @code{mpq_}.
4257
4258Rational numbers are stored in objects of type @code{mpq_t}.
4259
4260All rational arithmetic functions assume operands have a canonical form, and
4261canonicalize their result. The canonical form means that the denominator and
4262the numerator have no common factors, and that the denominator is positive.
4263Zero has the unique representation 0/1.
4264
4265Pure assignment functions do not canonicalize the assigned variable. It is
4266the responsibility of the user to canonicalize the assigned variable before
4267any arithmetic operations are performed on that variable.
4268
4269@deftypefun void mpq_canonicalize (mpq_t @var{op})
4270Remove any factors that are common to the numerator and denominator of
4271@var{op}, and make the denominator positive.
4272@end deftypefun
4273
4274@menu
4275* Initializing Rationals::
4276* Rational Conversions::
4277* Rational Arithmetic::
4278* Comparing Rationals::
4279* Applying Integer Functions::
4280* I/O of Rationals::
4281@end menu
4282
4283@node Initializing Rationals, Rational Conversions, Rational Number Functions, Rational Number Functions
4284@comment node-name, next, previous, up
4285@section Initialization and Assignment Functions
4286@cindex Rational assignment functions
4287@cindex Assignment functions
4288@cindex Rational initialization functions
4289@cindex Initialization functions
4290
4291@deftypefun void mpq_init (mpq_t @var{x})
4292Initialize @var{x} and set it to 0/1. Each variable should normally only be
4293initialized once, or at least cleared out (using the function @code{mpq_clear})
4294between each initialization.
4295@end deftypefun
4296
4297@deftypefun void mpq_inits (mpq_t @var{x}, ...)
4298Initialize a NULL-terminated list of @code{mpq_t} variables, and set their
4299values to 0/1.
4300@end deftypefun
4301
4302@deftypefun void mpq_clear (mpq_t @var{x})
4303Free the space occupied by @var{x}. Make sure to call this function for all
4304@code{mpq_t} variables when you are done with them.
4305@end deftypefun
4306
4307@deftypefun void mpq_clears (mpq_t @var{x}, ...)
4308Free the space occupied by a NULL-terminated list of @code{mpq_t} variables.
4309@end deftypefun
4310
4311@deftypefun void mpq_set (mpq_t @var{rop}, const mpq_t @var{op})
4312@deftypefunx void mpq_set_z (mpq_t @var{rop}, const mpz_t @var{op})
4313Assign @var{rop} from @var{op}.
4314@end deftypefun
4315
4316@deftypefun void mpq_set_ui (mpq_t @var{rop}, unsigned long int @var{op1}, unsigned long int @var{op2})
4317@deftypefunx void mpq_set_si (mpq_t @var{rop}, signed long int @var{op1}, unsigned long int @var{op2})
4318Set the value of @var{rop} to @var{op1}/@var{op2}. Note that if @var{op1} and
4319@var{op2} have common factors, @var{rop} has to be passed to
4320@code{mpq_canonicalize} before any operations are performed on @var{rop}.
4321@end deftypefun
4322
4323@deftypefun int mpq_set_str (mpq_t @var{rop}, const char *@var{str}, int @var{base})
4324Set @var{rop} from a null-terminated string @var{str} in the given @var{base}.
4325
4326The string can be an integer like ``41'' or a fraction like ``41/152''. The
4327fraction must be in canonical form (@pxref{Rational Number Functions}), or if
4328not then @code{mpq_canonicalize} must be called.
4329
4330The numerator and optional denominator are parsed the same as in
4331@code{mpz_set_str} (@pxref{Assigning Integers}). White space is allowed in
4332the string, and is simply ignored. The @var{base} can vary from 2 to 62, or
4333if @var{base} is 0 then the leading characters are used: @code{0x} or @code{0X} for hex,
4334@code{0b} or @code{0B} for binary,
4335@code{0} for octal, or decimal otherwise. Note that this is done separately
4336for the numerator and denominator, so for instance @code{0xEF/100} is 239/100,
4337whereas @code{0xEF/0x100} is 239/256.
4338
4339The return value is 0 if the entire string is a valid number, or @minus{}1 if
4340not.
4341@end deftypefun
4342
4343@deftypefun void mpq_swap (mpq_t @var{rop1}, mpq_t @var{rop2})
4344Swap the values @var{rop1} and @var{rop2} efficiently.
4345@end deftypefun
4346
4347
4348@need 2000
4349@node Rational Conversions, Rational Arithmetic, Initializing Rationals, Rational Number Functions
4350@comment node-name, next, previous, up
4351@section Conversion Functions
4352@cindex Rational conversion functions
4353@cindex Conversion functions
4354
4355@deftypefun double mpq_get_d (const mpq_t @var{op})
4356Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
4357towards zero).
4358
4359If the exponent from the conversion is too big or too small to fit a
4360@code{double} then the result is system dependent. For too big an infinity is
4361returned when available. For too small @math{0.0} is normally returned.
4362Hardware overflow, underflow and denorm traps may or may not occur.
4363@end deftypefun
4364
4365@deftypefun void mpq_set_d (mpq_t @var{rop}, double @var{op})
4366@deftypefunx void mpq_set_f (mpq_t @var{rop}, const mpf_t @var{op})
4367Set @var{rop} to the value of @var{op}. There is no rounding, this conversion
4368is exact.
4369@end deftypefun
4370
4371@deftypefun {char *} mpq_get_str (char *@var{str}, int @var{base}, const mpq_t @var{op})
4372Convert @var{op} to a string of digits in base @var{base}. The base argument
4373may vary from 2 to 62 or from @minus{}2 to @minus{}36. The string will be of
4374the form @samp{num/den}, or if the denominator is 1 then just @samp{num}.
4375
4376For @var{base} in the range 2..36, digits and lower-case letters are used; for
4377@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
4378digits, upper-case letters, and lower-case letters (in that significance order)
4379are used.
4380
4381If @var{str} is @code{NULL}, the result string is allocated using the current
4382allocation function (@pxref{Custom Allocation}). The block will be
4383@code{strlen(str)+1} bytes, that being exactly enough for the string and
4384null-terminator.
4385
4386If @var{str} is not @code{NULL}, it should point to a block of storage large
4387enough for the result, that being
4388
4389@example
4390mpz_sizeinbase (mpq_numref(@var{op}), @var{base})
4391+ mpz_sizeinbase (mpq_denref(@var{op}), @var{base}) + 3
4392@end example
4393
4394The three extra bytes are for a possible minus sign, possible slash, and the
4395null-terminator.
4396
4397A pointer to the result string is returned, being either the allocated block,
4398or the given @var{str}.
4399@end deftypefun
4400
4401
4402@node Rational Arithmetic, Comparing Rationals, Rational Conversions, Rational Number Functions
4403@comment node-name, next, previous, up
4404@section Arithmetic Functions
4405@cindex Rational arithmetic functions
4406@cindex Arithmetic functions
4407
4408@deftypefun void mpq_add (mpq_t @var{sum}, const mpq_t @var{addend1}, const mpq_t @var{addend2})
4409Set @var{sum} to @var{addend1} + @var{addend2}.
4410@end deftypefun
4411
4412@deftypefun void mpq_sub (mpq_t @var{difference}, const mpq_t @var{minuend}, const mpq_t @var{subtrahend})
4413Set @var{difference} to @var{minuend} @minus{} @var{subtrahend}.
4414@end deftypefun
4415
4416@deftypefun void mpq_mul (mpq_t @var{product}, const mpq_t @var{multiplier}, const mpq_t @var{multiplicand})
4417Set @var{product} to @math{@var{multiplier} @GMPtimes{} @var{multiplicand}}.
4418@end deftypefun
4419
4420@deftypefun void mpq_mul_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2})
4421Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
4422@var{op2}}.
4423@end deftypefun
4424
4425@deftypefun void mpq_div (mpq_t @var{quotient}, const mpq_t @var{dividend}, const mpq_t @var{divisor})
4426@cindex Division functions
4427Set @var{quotient} to @var{dividend}/@var{divisor}.
4428@end deftypefun
4429
4430@deftypefun void mpq_div_2exp (mpq_t @var{rop}, const mpq_t @var{op1}, mp_bitcnt_t @var{op2})
4431Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
4432@var{op2}}.
4433@end deftypefun
4434
4435@deftypefun void mpq_neg (mpq_t @var{negated_operand}, const mpq_t @var{operand})
4436Set @var{negated_operand} to @minus{}@var{operand}.
4437@end deftypefun
4438
4439@deftypefun void mpq_abs (mpq_t @var{rop}, const mpq_t @var{op})
4440Set @var{rop} to the absolute value of @var{op}.
4441@end deftypefun
4442
4443@deftypefun void mpq_inv (mpq_t @var{inverted_number}, const mpq_t @var{number})
4444Set @var{inverted_number} to 1/@var{number}. If the new denominator is
4445zero, this routine will divide by zero.
4446@end deftypefun
4447
4448@node Comparing Rationals, Applying Integer Functions, Rational Arithmetic, Rational Number Functions
4449@comment node-name, next, previous, up
4450@section Comparison Functions
4451@cindex Rational comparison functions
4452@cindex Comparison functions
4453
4454@deftypefun int mpq_cmp (const mpq_t @var{op1}, const mpq_t @var{op2})
4455@deftypefunx int mpq_cmp_z (const mpq_t @var{op1}, const mpz_t @var{op2})
4456Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
4457@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
4458@math{@var{op1} < @var{op2}}.
4459
4460To determine if two rationals are equal, @code{mpq_equal} is faster than
4461@code{mpq_cmp}.
4462@end deftypefun
4463
4464@deftypefn Macro int mpq_cmp_ui (const mpq_t @var{op1}, unsigned long int @var{num2}, unsigned long int @var{den2})
4465@deftypefnx Macro int mpq_cmp_si (const mpq_t @var{op1}, long int @var{num2}, unsigned long int @var{den2})
4466Compare @var{op1} and @var{num2}/@var{den2}. Return a positive value if
4467@math{@var{op1} > @var{num2}/@var{den2}}, zero if @math{@var{op1} =
4468@var{num2}/@var{den2}}, and a negative value if @math{@var{op1} <
4469@var{num2}/@var{den2}}.
4470
4471@var{num2} and @var{den2} are allowed to have common factors.
4472
4473These functions are implemented as a macros and evaluate their arguments
4474multiple times.
4475@end deftypefn
4476
4477@deftypefn Macro int mpq_sgn (const mpq_t @var{op})
4478@cindex Sign tests
4479@cindex Rational sign tests
4480Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
4481@math{-1} if @math{@var{op} < 0}.
4482
4483This function is actually implemented as a macro. It evaluates its
4484argument multiple times.
4485@end deftypefn
4486
4487@deftypefun int mpq_equal (const mpq_t @var{op1}, const mpq_t @var{op2})
4488Return non-zero if @var{op1} and @var{op2} are equal, zero if they are
4489non-equal. Although @code{mpq_cmp} can be used for the same purpose, this
4490function is much faster.
4491@end deftypefun
4492
4493@node Applying Integer Functions, I/O of Rationals, Comparing Rationals, Rational Number Functions
4494@comment node-name, next, previous, up
4495@section Applying Integer Functions to Rationals
4496@cindex Rational numerator and denominator
4497@cindex Numerator and denominator
4498
4499The set of @code{mpq} functions is quite small. In particular, there are few
4500functions for either input or output. The following functions give direct
4501access to the numerator and denominator of an @code{mpq_t}.
4502
4503Note that if an assignment to the numerator and/or denominator could take an
4504@code{mpq_t} out of the canonical form described at the start of this chapter
4505(@pxref{Rational Number Functions}) then @code{mpq_canonicalize} must be
4506called before any other @code{mpq} functions are applied to that @code{mpq_t}.
4507
4508@deftypefn Macro mpz_t mpq_numref (const mpq_t @var{op})
4509@deftypefnx Macro mpz_t mpq_denref (const mpq_t @var{op})
4510Return a reference to the numerator and denominator of @var{op}, respectively.
4511The @code{mpz} functions can be used on the result of these macros.
4512@end deftypefn
4513
4514@deftypefun void mpq_get_num (mpz_t @var{numerator}, const mpq_t @var{rational})
4515@deftypefunx void mpq_get_den (mpz_t @var{denominator}, const mpq_t @var{rational})
4516@deftypefunx void mpq_set_num (mpq_t @var{rational}, const mpz_t @var{numerator})
4517@deftypefunx void mpq_set_den (mpq_t @var{rational}, const mpz_t @var{denominator})
4518Get or set the numerator or denominator of a rational. These functions are
4519equivalent to calling @code{mpz_set} with an appropriate @code{mpq_numref} or
4520@code{mpq_denref}. Direct use of @code{mpq_numref} or @code{mpq_denref} is
4521recommended instead of these functions.
4522@end deftypefun
4523
4524
4525@need 2000
4526@node I/O of Rationals, , Applying Integer Functions, Rational Number Functions
4527@comment node-name, next, previous, up
4528@section Input and Output Functions
4529@cindex Rational input and output functions
4530@cindex Input functions
4531@cindex Output functions
4532@cindex I/O functions
4533
4534Functions that perform input from a stdio stream, and functions that output to
4535a stdio stream, of @code{mpq} numbers. Passing a @code{NULL} pointer for a
4536@var{stream} argument to any of these functions will make them read from
4537@code{stdin} and write to @code{stdout}, respectively.
4538
4539When using any of these functions, it is a good idea to include @file{stdio.h}
4540before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
4541for these functions.
4542
4543See also @ref{Formatted Output} and @ref{Formatted Input}.
4544
4545@deftypefun size_t mpq_out_str (FILE *@var{stream}, int @var{base}, const mpq_t @var{op})
4546Output @var{op} on stdio stream @var{stream}, as a string of digits in base
4547@var{base}. The base argument may vary from 2 to 62 or from @minus{}2 to
4548@minus{}36. Output is in the form
4549@samp{num/den} or if the denominator is 1 then just @samp{num}.
4550
4551For @var{base} in the range 2..36, digits and lower-case letters are used; for
4552@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
4553digits, upper-case letters, and lower-case letters (in that significance order)
4554are used.
4555
4556Return the number of bytes written, or if an error occurred, return 0.
4557@end deftypefun
4558
4559@deftypefun size_t mpq_inp_str (mpq_t @var{rop}, FILE *@var{stream}, int @var{base})
4560Read a string of digits from @var{stream} and convert them to a rational in
4561@var{rop}. Any initial white-space characters are read and discarded. Return
4562the number of characters read (including white space), or 0 if a rational
4563could not be read.
4564
4565The input can be a fraction like @samp{17/63} or just an integer like
4566@samp{123}. Reading stops at the first character not in this form, and white
4567space is not permitted within the string. If the input might not be in
4568canonical form, then @code{mpq_canonicalize} must be called (@pxref{Rational
4569Number Functions}).
4570
4571The @var{base} can be between 2 and 62, or can be 0 in which case the leading
4572characters of the string determine the base, @samp{0x} or @samp{0X} for
4573hexadecimal, @code{0b} and @code{0B} for binary, @samp{0} for octal, or
4574decimal otherwise. The leading characters
4575are examined separately for the numerator and denominator of a fraction, so
4576for instance @samp{0x10/11} is @math{16/11}, whereas @samp{0x10/0x11} is
4577@math{16/17}.
4578@end deftypefun
4579
4580
4581@node Floating-point Functions, Low-level Functions, Rational Number Functions, Top
4582@comment node-name, next, previous, up
4583@chapter Floating-point Functions
4584@cindex Floating-point functions
4585@cindex Float functions
4586@cindex User-defined precision
4587@cindex Precision of floats
4588
4589GMP floating point numbers are stored in objects of type @code{mpf_t} and
4590functions operating on them have an @code{mpf_} prefix.
4591
4592The mantissa of each float has a user-selectable precision, in practice only
4593limited by available memory. Each variable has its own precision, and that can
4594be increased or decreased at any time. This selectable precision is a minimum
4595value, GMP rounds it up to a whole limb.
4596
4597The accuracy of a calculation is determined by the priorly set precision of the
4598destination variable and the numeric values of the input variables. Input
4599variables' set precisions do not affect calculations (except indirectly as
4600their values might have been affected when they were assigned).
4601
4602The exponent of each float has fixed precision, one machine word on most
4603systems. In the current implementation the exponent is a count of limbs, so
4604for example on a 32-bit system this means a range of roughly
4605@math{2^@W{-68719476768}} to @math{2^@W{68719476736}}, or on a 64-bit system
4606this will be much greater. Note however that @code{mpf_get_str} can only
4607return an exponent which fits an @code{mp_exp_t} and currently
4608@code{mpf_set_str} doesn't accept exponents bigger than a @code{long}.
4609
4610Each variable keeps track of the mantissa data actually in use. This means
4611that if a float is exactly represented in only a few bits then only those bits
4612will be used in a calculation, even if the variable's selected precision is
4613high. This is a performance optimization; it does not affect the numeric
4614results.
4615
4616Internally, GMP sometimes calculates with higher precision than that of the
4617destination variable in order to limit errors. Final results are always
4618truncated to the destination variable's precision.
4619
4620The mantissa is stored in binary. One consequence of this is that decimal
4621fractions like @math{0.1} cannot be represented exactly. The same is true of
4622plain IEEE @code{double} floats. This makes both highly unsuitable for
4623calculations involving money or other values that should be exact decimal
4624fractions. (Suitably scaled integers, or perhaps rationals, are better
4625choices.)
4626
4627The @code{mpf} functions and variables have no special notion of infinity or
4628not-a-number, and applications must take care not to overflow the exponent or
4629results will be unpredictable.
4630
4631Note that the @code{mpf} functions are @emph{not} intended as a smooth
4632extension to IEEE P754 arithmetic. In particular results obtained on one
4633computer often differ from the results on a computer with a different word
4634size.
4635
4636New projects should consider using the GMP extension library MPFR
4637(@url{http://mpfr.org}) instead. MPFR provides well-defined precision and
4638accurate rounding, and thereby naturally extends IEEE P754.
4639
4640@menu
4641* Initializing Floats::
4642* Assigning Floats::
4643* Simultaneous Float Init & Assign::
4644* Converting Floats::
4645* Float Arithmetic::
4646* Float Comparison::
4647* I/O of Floats::
4648* Miscellaneous Float Functions::
4649@end menu
4650
4651@node Initializing Floats, Assigning Floats, Floating-point Functions, Floating-point Functions
4652@comment node-name, next, previous, up
4653@section Initialization Functions
4654@cindex Float initialization functions
4655@cindex Initialization functions
4656
4657@deftypefun void mpf_set_default_prec (mp_bitcnt_t @var{prec})
4658Set the default precision to be @strong{at least} @var{prec} bits. All
4659subsequent calls to @code{mpf_init} will use this precision, but previously
4660initialized variables are unaffected.
4661@end deftypefun
4662
4663@deftypefun {mp_bitcnt_t} mpf_get_default_prec (void)
4664Return the default precision actually used.
4665@end deftypefun
4666
4667An @code{mpf_t} object must be initialized before storing the first value in
4668it. The functions @code{mpf_init} and @code{mpf_init2} are used for that
4669purpose.
4670
4671@deftypefun void mpf_init (mpf_t @var{x})
4672Initialize @var{x} to 0. Normally, a variable should be initialized once only
4673or at least be cleared, using @code{mpf_clear}, between initializations. The
4674precision of @var{x} is undefined unless a default precision has already been
4675established by a call to @code{mpf_set_default_prec}.
4676@end deftypefun
4677
4678@deftypefun void mpf_init2 (mpf_t @var{x}, mp_bitcnt_t @var{prec})
4679Initialize @var{x} to 0 and set its precision to be @strong{at least}
4680@var{prec} bits. Normally, a variable should be initialized once only or at
4681least be cleared, using @code{mpf_clear}, between initializations.
4682@end deftypefun
4683
4684@deftypefun void mpf_inits (mpf_t @var{x}, ...)
4685Initialize a NULL-terminated list of @code{mpf_t} variables, and set their
4686values to 0. The precision of the initialized variables is undefined unless a
4687default precision has already been established by a call to
4688@code{mpf_set_default_prec}.
4689@end deftypefun
4690
4691@deftypefun void mpf_clear (mpf_t @var{x})
4692Free the space occupied by @var{x}. Make sure to call this function for all
4693@code{mpf_t} variables when you are done with them.
4694@end deftypefun
4695
4696@deftypefun void mpf_clears (mpf_t @var{x}, ...)
4697Free the space occupied by a NULL-terminated list of @code{mpf_t} variables.
4698@end deftypefun
4699
4700@need 2000
4701Here is an example on how to initialize floating-point variables:
4702@example
4703@{
4704 mpf_t x, y;
4705 mpf_init (x); /* use default precision */
4706 mpf_init2 (y, 256); /* precision @emph{at least} 256 bits */
4707 @dots{}
4708 /* Unless the program is about to exit, do ... */
4709 mpf_clear (x);
4710 mpf_clear (y);
4711@}
4712@end example
4713
4714The following three functions are useful for changing the precision during a
4715calculation. A typical use would be for adjusting the precision gradually in
4716iterative algorithms like Newton-Raphson, making the computation precision
4717closely match the actual accurate part of the numbers.
4718
4719@deftypefun {mp_bitcnt_t} mpf_get_prec (const mpf_t @var{op})
4720Return the current precision of @var{op}, in bits.
4721@end deftypefun
4722
4723@deftypefun void mpf_set_prec (mpf_t @var{rop}, mp_bitcnt_t @var{prec})
4724Set the precision of @var{rop} to be @strong{at least} @var{prec} bits. The
4725value in @var{rop} will be truncated to the new precision.
4726
4727This function requires a call to @code{realloc}, and so should not be used in
4728a tight loop.
4729@end deftypefun
4730
4731@deftypefun void mpf_set_prec_raw (mpf_t @var{rop}, mp_bitcnt_t @var{prec})
4732Set the precision of @var{rop} to be @strong{at least} @var{prec} bits,
4733without changing the memory allocated.
4734
4735@var{prec} must be no more than the allocated precision for @var{rop}, that
4736being the precision when @var{rop} was initialized, or in the most recent
4737@code{mpf_set_prec}.
4738
4739The value in @var{rop} is unchanged, and in particular if it had a higher
4740precision than @var{prec} it will retain that higher precision. New values
4741written to @var{rop} will use the new @var{prec}.
4742
4743Before calling @code{mpf_clear} or the full @code{mpf_set_prec}, another
4744@code{mpf_set_prec_raw} call must be made to restore @var{rop} to its original
4745allocated precision. Failing to do so will have unpredictable results.
4746
4747@code{mpf_get_prec} can be used before @code{mpf_set_prec_raw} to get the
4748original allocated precision. After @code{mpf_set_prec_raw} it reflects the
4749@var{prec} value set.
4750
4751@code{mpf_set_prec_raw} is an efficient way to use an @code{mpf_t} variable at
4752different precisions during a calculation, perhaps to gradually increase
4753precision in an iteration, or just to use various different precisions for
4754different purposes during a calculation.
4755@end deftypefun
4756
4757
4758@need 2000
4759@node Assigning Floats, Simultaneous Float Init & Assign, Initializing Floats, Floating-point Functions
4760@comment node-name, next, previous, up
4761@section Assignment Functions
4762@cindex Float assignment functions
4763@cindex Assignment functions
4764
4765These functions assign new values to already initialized floats
4766(@pxref{Initializing Floats}).
4767
4768@deftypefun void mpf_set (mpf_t @var{rop}, const mpf_t @var{op})
4769@deftypefunx void mpf_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
4770@deftypefunx void mpf_set_si (mpf_t @var{rop}, signed long int @var{op})
4771@deftypefunx void mpf_set_d (mpf_t @var{rop}, double @var{op})
4772@deftypefunx void mpf_set_z (mpf_t @var{rop}, const mpz_t @var{op})
4773@deftypefunx void mpf_set_q (mpf_t @var{rop}, const mpq_t @var{op})
4774Set the value of @var{rop} from @var{op}.
4775@end deftypefun
4776
4777@deftypefun int mpf_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base})
4778Set the value of @var{rop} from the string in @var{str}. The string is of the
4779form @samp{M@@N} or, if the base is 10 or less, alternatively @samp{MeN}.
4780@samp{M} is the mantissa and @samp{N} is the exponent. The mantissa is always
4781in the specified base. The exponent is either in the specified base or, if
4782@var{base} is negative, in decimal. The decimal point expected is taken from
4783the current locale, on systems providing @code{localeconv}.
4784
4785The argument @var{base} may be in the ranges 2 to 62, or @minus{}62 to
4786@minus{}2. Negative values are used to specify that the exponent is in
4787decimal.
4788
4789For bases up to 36, case is ignored; upper-case and lower-case letters have
4790the same value; for bases 37 to 62, upper-case letter represent the usual
479110..35 while lower-case letter represent 36..61.
4792
4793Unlike the corresponding @code{mpz} function, the base will not be determined
4794from the leading characters of the string if @var{base} is 0. This is so that
4795numbers like @samp{0.23} are not interpreted as octal.
4796
4797White space is allowed in the string, and is simply ignored. [This is not
4798really true; white-space is ignored in the beginning of the string and within
4799the mantissa, but not in other places, such as after a minus sign or in the
4800exponent. We are considering changing the definition of this function, making
4801it fail when there is any white-space in the input, since that makes a lot of
4802sense. Please tell us your opinion about this change. Do you really want it
4803to accept @nicode{"3 14"} as meaning 314 as it does now?]
4804
4805This function returns 0 if the entire string is a valid number in base
4806@var{base}. Otherwise it returns @minus{}1.
4807@end deftypefun
4808
4809@deftypefun void mpf_swap (mpf_t @var{rop1}, mpf_t @var{rop2})
4810Swap @var{rop1} and @var{rop2} efficiently. Both the values and the
4811precisions of the two variables are swapped.
4812@end deftypefun
4813
4814
4815@node Simultaneous Float Init & Assign, Converting Floats, Assigning Floats, Floating-point Functions
4816@comment node-name, next, previous, up
4817@section Combined Initialization and Assignment Functions
4818@cindex Float assignment functions
4819@cindex Assignment functions
4820@cindex Float initialization functions
4821@cindex Initialization functions
4822
4823For convenience, GMP provides a parallel series of initialize-and-set functions
4824which initialize the output and then store the value there. These functions'
4825names have the form @code{mpf_init_set@dots{}}
4826
4827Once the float has been initialized by any of the @code{mpf_init_set@dots{}}
4828functions, it can be used as the source or destination operand for the ordinary
4829float functions. Don't use an initialize-and-set function on a variable
4830already initialized!
4831
4832@deftypefun void mpf_init_set (mpf_t @var{rop}, const mpf_t @var{op})
4833@deftypefunx void mpf_init_set_ui (mpf_t @var{rop}, unsigned long int @var{op})
4834@deftypefunx void mpf_init_set_si (mpf_t @var{rop}, signed long int @var{op})
4835@deftypefunx void mpf_init_set_d (mpf_t @var{rop}, double @var{op})
4836Initialize @var{rop} and set its value from @var{op}.
4837
4838The precision of @var{rop} will be taken from the active default precision, as
4839set by @code{mpf_set_default_prec}.
4840@end deftypefun
4841
4842@deftypefun int mpf_init_set_str (mpf_t @var{rop}, const char *@var{str}, int @var{base})
4843Initialize @var{rop} and set its value from the string in @var{str}. See
4844@code{mpf_set_str} above for details on the assignment operation.
4845
4846Note that @var{rop} is initialized even if an error occurs. (I.e., you have to
4847call @code{mpf_clear} for it.)
4848
4849The precision of @var{rop} will be taken from the active default precision, as
4850set by @code{mpf_set_default_prec}.
4851@end deftypefun
4852
4853
4854@node Converting Floats, Float Arithmetic, Simultaneous Float Init & Assign, Floating-point Functions
4855@comment node-name, next, previous, up
4856@section Conversion Functions
4857@cindex Float conversion functions
4858@cindex Conversion functions
4859
4860@deftypefun double mpf_get_d (const mpf_t @var{op})
4861Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
4862towards zero).
4863
4864If the exponent in @var{op} is too big or too small to fit a @code{double}
4865then the result is system dependent. For too big an infinity is returned when
4866available. For too small @math{0.0} is normally returned. Hardware overflow,
4867underflow and denorm traps may or may not occur.
4868@end deftypefun
4869
4870@deftypefun double mpf_get_d_2exp (signed long int *@var{exp}, const mpf_t @var{op})
4871Convert @var{op} to a @code{double}, truncating if necessary (i.e.@: rounding
4872towards zero), and with an exponent returned separately.
4873
4874The return value is in the range @math{0.5@le{}@GMPabs{@var{d}}<1} and the
4875exponent is stored to @code{*@var{exp}}. @m{@var{d} \times 2^{exp},
4876@var{d} * 2^@var{exp}} is the (truncated) @var{op} value. If @var{op} is zero,
4877the return is @math{0.0} and 0 is stored to @code{*@var{exp}}.
4878
4879@cindex @code{frexp}
4880This is similar to the standard C @code{frexp} function (@pxref{Normalization
4881Functions,,, libc, The GNU C Library Reference Manual}).
4882@end deftypefun
4883
4884@deftypefun long mpf_get_si (const mpf_t @var{op})
4885@deftypefunx {unsigned long} mpf_get_ui (const mpf_t @var{op})
4886Convert @var{op} to a @code{long} or @code{unsigned long}, truncating any
4887fraction part. If @var{op} is too big for the return type, the result is
4888undefined.
4889
4890See also @code{mpf_fits_slong_p} and @code{mpf_fits_ulong_p}
4891(@pxref{Miscellaneous Float Functions}).
4892@end deftypefun
4893
4894@deftypefun {char *} mpf_get_str (char *@var{str}, mp_exp_t *@var{expptr}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op})
4895Convert @var{op} to a string of digits in base @var{base}. The base argument
4896may vary from 2 to 62 or from @minus{}2 to @minus{}36. Up to @var{n_digits}
4897digits will be generated. Trailing zeros are not returned. No more digits
4898than can be accurately represented by @var{op} are ever generated. If
4899@var{n_digits} is 0 then that accurate maximum number of digits are generated.
4900
4901For @var{base} in the range 2..36, digits and lower-case letters are used; for
4902@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
4903digits, upper-case letters, and lower-case letters (in that significance order)
4904are used.
4905
4906If @var{str} is @code{NULL}, the result string is allocated using the current
4907allocation function (@pxref{Custom Allocation}). The block will be
4908@code{strlen(str)+1} bytes, that being exactly enough for the string and
4909null-terminator.
4910
4911If @var{str} is not @code{NULL}, it should point to a block of
4912@math{@var{n_digits} + 2} bytes, that being enough for the mantissa, a
4913possible minus sign, and a null-terminator. When @var{n_digits} is 0 to get
4914all significant digits, an application won't be able to know the space
4915required, and @var{str} should be @code{NULL} in that case.
4916
4917The generated string is a fraction, with an implicit radix point immediately
4918to the left of the first digit. The applicable exponent is written through
4919the @var{expptr} pointer. For example, the number 3.1416 would be returned as
4920string @nicode{"31416"} and exponent 1.
4921
4922When @var{op} is zero, an empty string is produced and the exponent returned
4923is 0.
4924
4925A pointer to the result string is returned, being either the allocated block
4926or the given @var{str}.
4927@end deftypefun
4928
4929
4930@node Float Arithmetic, Float Comparison, Converting Floats, Floating-point Functions
4931@comment node-name, next, previous, up
4932@section Arithmetic Functions
4933@cindex Float arithmetic functions
4934@cindex Arithmetic functions
4935
4936@deftypefun void mpf_add (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
4937@deftypefunx void mpf_add_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
4938Set @var{rop} to @math{@var{op1} + @var{op2}}.
4939@end deftypefun
4940
4941@deftypefun void mpf_sub (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
4942@deftypefunx void mpf_ui_sub (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2})
4943@deftypefunx void mpf_sub_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
4944Set @var{rop} to @var{op1} @minus{} @var{op2}.
4945@end deftypefun
4946
4947@deftypefun void mpf_mul (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
4948@deftypefunx void mpf_mul_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
4949Set @var{rop} to @math{@var{op1} @GMPtimes{} @var{op2}}.
4950@end deftypefun
4951
4952Division is undefined if the divisor is zero, and passing a zero divisor to the
4953divide functions will make these functions intentionally divide by zero. This
4954lets the user handle arithmetic exceptions in these functions in the same
4955manner as other arithmetic exceptions.
4956
4957@deftypefun void mpf_div (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
4958@deftypefunx void mpf_ui_div (mpf_t @var{rop}, unsigned long int @var{op1}, const mpf_t @var{op2})
4959@deftypefunx void mpf_div_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
4960@cindex Division functions
4961Set @var{rop} to @var{op1}/@var{op2}.
4962@end deftypefun
4963
4964@deftypefun void mpf_sqrt (mpf_t @var{rop}, const mpf_t @var{op})
4965@deftypefunx void mpf_sqrt_ui (mpf_t @var{rop}, unsigned long int @var{op})
4966@cindex Root extraction functions
4967Set @var{rop} to @m{\sqrt{@var{op}}, the square root of @var{op}}.
4968@end deftypefun
4969
4970@deftypefun void mpf_pow_ui (mpf_t @var{rop}, const mpf_t @var{op1}, unsigned long int @var{op2})
4971@cindex Exponentiation functions
4972@cindex Powering functions
4973Set @var{rop} to @m{@var{op1}^{op2}, @var{op1} raised to the power @var{op2}}.
4974@end deftypefun
4975
4976@deftypefun void mpf_neg (mpf_t @var{rop}, const mpf_t @var{op})
4977Set @var{rop} to @minus{}@var{op}.
4978@end deftypefun
4979
4980@deftypefun void mpf_abs (mpf_t @var{rop}, const mpf_t @var{op})
4981Set @var{rop} to the absolute value of @var{op}.
4982@end deftypefun
4983
4984@deftypefun void mpf_mul_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2})
4985Set @var{rop} to @m{@var{op1} \times 2^{op2}, @var{op1} times 2 raised to
4986@var{op2}}.
4987@end deftypefun
4988
4989@deftypefun void mpf_div_2exp (mpf_t @var{rop}, const mpf_t @var{op1}, mp_bitcnt_t @var{op2})
4990Set @var{rop} to @m{@var{op1}/2^{op2}, @var{op1} divided by 2 raised to
4991@var{op2}}.
4992@end deftypefun
4993
4994@node Float Comparison, I/O of Floats, Float Arithmetic, Floating-point Functions
4995@comment node-name, next, previous, up
4996@section Comparison Functions
4997@cindex Float comparison functions
4998@cindex Comparison functions
4999
5000@deftypefun int mpf_cmp (const mpf_t @var{op1}, const mpf_t @var{op2})
5001@deftypefunx int mpf_cmp_z (const mpf_t @var{op1}, const mpz_t @var{op2})
5002@deftypefunx int mpf_cmp_d (const mpf_t @var{op1}, double @var{op2})
5003@deftypefunx int mpf_cmp_ui (const mpf_t @var{op1}, unsigned long int @var{op2})
5004@deftypefunx int mpf_cmp_si (const mpf_t @var{op1}, signed long int @var{op2})
5005Compare @var{op1} and @var{op2}. Return a positive value if @math{@var{op1} >
5006@var{op2}}, zero if @math{@var{op1} = @var{op2}}, and a negative value if
5007@math{@var{op1} < @var{op2}}.
5008
5009@code{mpf_cmp_d} can be called with an infinity, but results are undefined for
5010a NaN.
5011@end deftypefun
5012
5013@deftypefun int mpf_eq (const mpf_t @var{op1}, const mpf_t @var{op2}, mp_bitcnt_t op3)
5014@strong{This function is mathematically ill-defined and should not be used.}
5015
5016Return non-zero if the first @var{op3} bits of @var{op1} and @var{op2} are
5017equal, zero otherwise. Note that numbers like e.g., 256 (binary 100000000) and
5018255 (binary 11111111) will never be equal by this function's measure, and
5019furthermore that 0 will only be equal to itself.
5020@end deftypefun
5021
5022@deftypefun void mpf_reldiff (mpf_t @var{rop}, const mpf_t @var{op1}, const mpf_t @var{op2})
5023Compute the relative difference between @var{op1} and @var{op2} and store the
5024result in @var{rop}. This is @math{@GMPabs{@var{op1}-@var{op2}}/@var{op1}}.
5025@end deftypefun
5026
5027@deftypefn Macro int mpf_sgn (const mpf_t @var{op})
5028@cindex Sign tests
5029@cindex Float sign tests
5030Return @math{+1} if @math{@var{op} > 0}, 0 if @math{@var{op} = 0}, and
5031@math{-1} if @math{@var{op} < 0}.
5032
5033This function is actually implemented as a macro. It evaluates its argument
5034multiple times.
5035@end deftypefn
5036
5037@node I/O of Floats, Miscellaneous Float Functions, Float Comparison, Floating-point Functions
5038@comment node-name, next, previous, up
5039@section Input and Output Functions
5040@cindex Float input and output functions
5041@cindex Input functions
5042@cindex Output functions
5043@cindex I/O functions
5044
5045Functions that perform input from a stdio stream, and functions that output to
5046a stdio stream, of @code{mpf} numbers. Passing a @code{NULL} pointer for a
5047@var{stream} argument to any of these functions will make them read from
5048@code{stdin} and write to @code{stdout}, respectively.
5049
5050When using any of these functions, it is a good idea to include @file{stdio.h}
5051before @file{gmp.h}, since that will allow @file{gmp.h} to define prototypes
5052for these functions.
5053
5054See also @ref{Formatted Output} and @ref{Formatted Input}.
5055
5056@deftypefun size_t mpf_out_str (FILE *@var{stream}, int @var{base}, size_t @var{n_digits}, const mpf_t @var{op})
5057Print @var{op} to @var{stream}, as a string of digits. Return the number of
5058bytes written, or if an error occurred, return 0.
5059
5060The mantissa is prefixed with an @samp{0.} and is in the given @var{base},
5061which may vary from 2 to 62 or from @minus{}2 to @minus{}36. An exponent is
5062then printed, separated by an @samp{e}, or if the base is greater than 10 then
5063by an @samp{@@}. The exponent is always in decimal. The decimal point follows
5064the current locale, on systems providing @code{localeconv}.
5065
5066For @var{base} in the range 2..36, digits and lower-case letters are used; for
5067@minus{}2..@minus{}36, digits and upper-case letters are used; for 37..62,
5068digits, upper-case letters, and lower-case letters (in that significance order)
5069are used.
5070
5071Up to @var{n_digits} will be printed from the mantissa, except that no more
5072digits than are accurately representable by @var{op} will be printed.
5073@var{n_digits} can be 0 to select that accurate maximum.
5074@end deftypefun
5075
5076@deftypefun size_t mpf_inp_str (mpf_t @var{rop}, FILE *@var{stream}, int @var{base})
5077Read a string in base @var{base} from @var{stream}, and put the read float in
5078@var{rop}. The string is of the form @samp{M@@N} or, if the base is 10 or
5079less, alternatively @samp{MeN}. @samp{M} is the mantissa and @samp{N} is the
5080exponent. The mantissa is always in the specified base. The exponent is
5081either in the specified base or, if @var{base} is negative, in decimal. The
5082decimal point expected is taken from the current locale, on systems providing
5083@code{localeconv}.
5084
5085The argument @var{base} may be in the ranges 2 to 36, or @minus{}36 to
5086@minus{}2. Negative values are used to specify that the exponent is in
5087decimal.
5088
5089Unlike the corresponding @code{mpz} function, the base will not be determined
5090from the leading characters of the string if @var{base} is 0. This is so that
5091numbers like @samp{0.23} are not interpreted as octal.
5092
5093Return the number of bytes read, or if an error occurred, return 0.
5094@end deftypefun
5095
5096@c @deftypefun void mpf_out_raw (FILE *@var{stream}, const mpf_t @var{float})
5097@c Output @var{float} on stdio stream @var{stream}, in raw binary
5098@c format. The float is written in a portable format, with 4 bytes of
5099@c size information, and that many bytes of limbs. Both the size and the
5100@c limbs are written in decreasing significance order.
5101@c @end deftypefun
5102
5103@c @deftypefun void mpf_inp_raw (mpf_t @var{float}, FILE *@var{stream})
5104@c Input from stdio stream @var{stream} in the format written by
5105@c @code{mpf_out_raw}, and put the result in @var{float}.
5106@c @end deftypefun
5107
5108
5109@node Miscellaneous Float Functions, , I/O of Floats, Floating-point Functions
5110@comment node-name, next, previous, up
5111@section Miscellaneous Functions
5112@cindex Miscellaneous float functions
5113@cindex Float miscellaneous functions
5114
5115@deftypefun void mpf_ceil (mpf_t @var{rop}, const mpf_t @var{op})
5116@deftypefunx void mpf_floor (mpf_t @var{rop}, const mpf_t @var{op})
5117@deftypefunx void mpf_trunc (mpf_t @var{rop}, const mpf_t @var{op})
5118@cindex Rounding functions
5119@cindex Float rounding functions
5120Set @var{rop} to @var{op} rounded to an integer. @code{mpf_ceil} rounds to the
5121next higher integer, @code{mpf_floor} to the next lower, and @code{mpf_trunc}
5122to the integer towards zero.
5123@end deftypefun
5124
5125@deftypefun int mpf_integer_p (const mpf_t @var{op})
5126Return non-zero if @var{op} is an integer.
5127@end deftypefun
5128
5129@deftypefun int mpf_fits_ulong_p (const mpf_t @var{op})
5130@deftypefunx int mpf_fits_slong_p (const mpf_t @var{op})
5131@deftypefunx int mpf_fits_uint_p (const mpf_t @var{op})
5132@deftypefunx int mpf_fits_sint_p (const mpf_t @var{op})
5133@deftypefunx int mpf_fits_ushort_p (const mpf_t @var{op})
5134@deftypefunx int mpf_fits_sshort_p (const mpf_t @var{op})
5135Return non-zero if @var{op} would fit in the respective C data type, when
5136truncated to an integer.
5137@end deftypefun
5138
5139@deftypefun void mpf_urandomb (mpf_t @var{rop}, gmp_randstate_t @var{state}, mp_bitcnt_t @var{nbits})
5140@cindex Random number functions
5141@cindex Float random number functions
5142Generate a uniformly distributed random float in @var{rop}, such that @math{0
5143@le{} @var{rop} < 1}, with @var{nbits} significant bits in the mantissa or
5144less if the precision of @var{rop} is smaller.
5145
5146The variable @var{state} must be initialized by calling one of the
5147@code{gmp_randinit} functions (@ref{Random State Initialization}) before
5148invoking this function.
5149@end deftypefun
5150
5151@deftypefun void mpf_random2 (mpf_t @var{rop}, mp_size_t @var{max_size}, mp_exp_t @var{exp})
5152Generate a random float of at most @var{max_size} limbs, with long strings of
5153zeros and ones in the binary representation. The exponent of the number is in
5154the interval @minus{}@var{exp} to @var{exp} (in limbs). This function is
5155useful for testing functions and algorithms, since these kind of random
5156numbers have proven to be more likely to trigger corner-case bugs. Negative
5157random numbers are generated when @var{max_size} is negative.
5158@end deftypefun
5159
5160@c @deftypefun size_t mpf_size (const mpf_t @var{op})
5161@c Return the size of @var{op} measured in number of limbs. If @var{op} is
5162@c zero, the returned value will be zero. (@xref{Nomenclature}, for an
5163@c explanation of the concept @dfn{limb}.)
5164@c
5165@c @strong{This function is obsolete. It will disappear from future GMP
5166@c releases.}
5167@c @end deftypefun
5168
5169
5170@node Low-level Functions, Random Number Functions, Floating-point Functions, Top
5171@comment node-name, next, previous, up
5172@chapter Low-level Functions
5173@cindex Low-level functions
5174
5175This chapter describes low-level GMP functions, used to implement the
5176high-level GMP functions, but also intended for time-critical user code.
5177
5178These functions start with the prefix @code{mpn_}.
5179
5180@c 1. Some of these function clobber input operands.
5181@c
5182
5183The @code{mpn} functions are designed to be as fast as possible, @strong{not}
5184to provide a coherent calling interface. The different functions have somewhat
5185similar interfaces, but there are variations that make them hard to use. These
5186functions do as little as possible apart from the real multiple precision
5187computation, so that no time is spent on things that not all callers need.
5188
5189A source operand is specified by a pointer to the least significant limb and a
5190limb count. A destination operand is specified by just a pointer. It is the
5191responsibility of the caller to ensure that the destination has enough space
5192for storing the result.
5193
5194With this way of specifying operands, it is possible to perform computations on
5195subranges of an argument, and store the result into a subrange of a
5196destination.
5197
5198A common requirement for all functions is that each source area needs at least
5199one limb. No size argument may be zero. Unless otherwise stated, in-place
5200operations are allowed where source and destination are the same, but not where
5201they only partly overlap.
5202
5203The @code{mpn} functions are the base for the implementation of the
5204@code{mpz_}, @code{mpf_}, and @code{mpq_} functions.
5205
5206This example adds the number beginning at @var{s1p} and the number beginning at
5207@var{s2p} and writes the sum at @var{destp}. All areas have @var{n} limbs.
5208
5209@example
5210cy = mpn_add_n (destp, s1p, s2p, n)
5211@end example
5212
5213It should be noted that the @code{mpn} functions make no attempt to identify
5214high or low zero limbs on their operands, or other special forms. On random
5215data such cases will be unlikely and it'd be wasteful for every function to
5216check every time. An application knowing something about its data can take
5217steps to trim or perhaps split its calculations.
5218@c
5219@c For reference, within gmp mpz_t operands never have high zero limbs, and
5220@c we rate low zero limbs as unlikely too (or something an application should
5221@c handle). This is a prime motivation for not stripping zero limbs in say
5222@c mpn_mul_n etc.
5223@c
5224@c Other applications doing variable-length calculations will quite likely do
5225@c something similar to mpz. And even if not then it's highly likely zero
5226@c limb stripping can be done at just a few judicious points, which will be
5227@c more efficient than having lots of mpn functions checking every time.
5228
5229@sp 1
5230@noindent
5231In the notation used below, a source operand is identified by the pointer to
5232the least significant limb, and the limb count in braces. For example,
5233@{@var{s1p}, @var{s1n}@}.
5234
5235@deftypefun mp_limb_t mpn_add_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5236Add @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the @var{n}
5237least significant limbs of the result to @var{rp}. Return carry, either 0 or
52381.
5239
5240This is the lowest-level function for addition. It is the preferred function
5241for addition, since it is written in assembly for most CPUs. For addition of
5242a variable to itself (i.e., @var{s1p} equals @var{s2p}) use @code{mpn_lshift}
5243with a count of 1 for optimal speed.
5244@end deftypefun
5245
5246@deftypefun mp_limb_t mpn_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
5247Add @{@var{s1p}, @var{n}@} and @var{s2limb}, and write the @var{n} least
5248significant limbs of the result to @var{rp}. Return carry, either 0 or 1.
5249@end deftypefun
5250
5251@deftypefun mp_limb_t mpn_add (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
5252Add @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
5253@var{s1n} least significant limbs of the result to @var{rp}. Return carry,
5254either 0 or 1.
5255
5256This function requires that @var{s1n} is greater than or equal to @var{s2n}.
5257@end deftypefun
5258
5259@deftypefun mp_limb_t mpn_sub_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5260Subtract @{@var{s2p}, @var{n}@} from @{@var{s1p}, @var{n}@}, and write the
5261@var{n} least significant limbs of the result to @var{rp}. Return borrow,
5262either 0 or 1.
5263
5264This is the lowest-level function for subtraction. It is the preferred
5265function for subtraction, since it is written in assembly for most CPUs.
5266@end deftypefun
5267
5268@deftypefun mp_limb_t mpn_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
5269Subtract @var{s2limb} from @{@var{s1p}, @var{n}@}, and write the @var{n} least
5270significant limbs of the result to @var{rp}. Return borrow, either 0 or 1.
5271@end deftypefun
5272
5273@deftypefun mp_limb_t mpn_sub (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
5274Subtract @{@var{s2p}, @var{s2n}@} from @{@var{s1p}, @var{s1n}@}, and write the
5275@var{s1n} least significant limbs of the result to @var{rp}. Return borrow,
5276either 0 or 1.
5277
5278This function requires that @var{s1n} is greater than or equal to
5279@var{s2n}.
5280@end deftypefun
5281
5282@deftypefun mp_limb_t mpn_neg (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
5283Perform the negation of @{@var{sp}, @var{n}@}, and write the result to
5284@{@var{rp}, @var{n}@}. This is equivalent to calling @code{mpn_sub_n} with a
5285@var{n}-limb zero minuend and passing @{@var{sp}, @var{n}@} as subtrahend.
5286Return borrow, either 0 or 1.
5287@end deftypefun
5288
5289@deftypefun void mpn_mul_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5290Multiply @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@}, and write the
52912*@var{n}-limb result to @var{rp}.
5292
5293The destination has to have space for 2*@var{n} limbs, even if the product's
5294most significant limb is zero. No overlap is permitted between the
5295destination and either source.
5296
5297If the two input operands are the same, use @code{mpn_sqr}.
5298@end deftypefun
5299
5300@deftypefun mp_limb_t mpn_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n})
5301Multiply @{@var{s1p}, @var{s1n}@} and @{@var{s2p}, @var{s2n}@}, and write the
5302(@var{s1n}+@var{s2n})-limb result to @var{rp}. Return the most significant
5303limb of the result.
5304
5305The destination has to have space for @var{s1n} + @var{s2n} limbs, even if the
5306product's most significant limb is zero. No overlap is permitted between the
5307destination and either source.
5308
5309This function requires that @var{s1n} is greater than or equal to @var{s2n}.
5310@end deftypefun
5311
5312@deftypefun void mpn_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
5313Compute the square of @{@var{s1p}, @var{n}@} and write the 2*@var{n}-limb
5314result to @var{rp}.
5315
5316The destination has to have space for 2@var{n} limbs, even if the result's
5317most significant limb is zero. No overlap is permitted between the
5318destination and the source.
5319@end deftypefun
5320
5321@deftypefun mp_limb_t mpn_mul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
5322Multiply @{@var{s1p}, @var{n}@} by @var{s2limb}, and write the @var{n} least
5323significant limbs of the product to @var{rp}. Return the most significant
5324limb of the product. @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are
5325allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
5326
5327This is a low-level function that is a building block for general
5328multiplication as well as other operations in GMP@. It is written in assembly
5329for most CPUs.
5330
5331Don't call this function if @var{s2limb} is a power of 2; use @code{mpn_lshift}
5332with a count equal to the logarithm of @var{s2limb} instead, for optimal speed.
5333@end deftypefun
5334
5335@deftypefun mp_limb_t mpn_addmul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
5336Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and add the @var{n} least
5337significant limbs of the product to @{@var{rp}, @var{n}@} and write the result
5338to @var{rp}. Return the most significant limb of the product, plus carry-out
5339from the addition. @{@var{s1p}, @var{n}@} and @{@var{rp}, @var{n}@} are
5340allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
5341
5342This is a low-level function that is a building block for general
5343multiplication as well as other operations in GMP@. It is written in assembly
5344for most CPUs.
5345@end deftypefun
5346
5347@deftypefun mp_limb_t mpn_submul_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n}, mp_limb_t @var{s2limb})
5348Multiply @{@var{s1p}, @var{n}@} and @var{s2limb}, and subtract the @var{n}
5349least significant limbs of the product from @{@var{rp}, @var{n}@} and write the
5350result to @var{rp}. Return the most significant limb of the product, plus
5351borrow-out from the subtraction. @{@var{s1p}, @var{n}@} and @{@var{rp},
5352@var{n}@} are allowed to overlap provided @math{@var{rp} @le{} @var{s1p}}.
5353
5354This is a low-level function that is a building block for general
5355multiplication and division as well as other operations in GMP@. It is written
5356in assembly for most CPUs.
5357@end deftypefun
5358
5359@deftypefun void mpn_tdiv_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{rp}, mp_size_t @var{qxn}, const mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn})
5360Divide @{@var{np}, @var{nn}@} by @{@var{dp}, @var{dn}@} and put the quotient
5361at @{@var{qp}, @var{nn}@minus{}@var{dn}+1@} and the remainder at @{@var{rp},
5362@var{dn}@}. The quotient is rounded towards 0.
5363
5364No overlap is permitted between arguments, except that @var{np} might equal
5365@var{rp}. The dividend size @var{nn} must be greater than or equal to divisor
5366size @var{dn}. The most significant limb of the divisor must be non-zero. The
5367@var{qxn} operand must be zero.
5368@end deftypefun
5369
5370@deftypefun mp_limb_t mpn_divrem (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
5371[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best
5372performance.]
5373
5374Divide @{@var{rs2p}, @var{rs2n}@} by @{@var{s3p}, @var{s3n}@}, and write the
5375quotient at @var{r1p}, with the exception of the most significant limb, which
5376is returned. The remainder replaces the dividend at @var{rs2p}; it will be
5377@var{s3n} limbs long (i.e., as many limbs as the divisor).
5378
5379In addition to an integer quotient, @var{qxn} fraction limbs are developed, and
5380stored after the integral limbs. For most usages, @var{qxn} will be zero.
5381
5382It is required that @var{rs2n} is greater than or equal to @var{s3n}. It is
5383required that the most significant bit of the divisor is set.
5384
5385If the quotient is not needed, pass @var{rs2p} + @var{s3n} as @var{r1p}. Aside
5386from that special case, no overlap between arguments is permitted.
5387
5388Return the most significant limb of the quotient, either 0 or 1.
5389
5390The area at @var{r1p} needs to be @var{rs2n} @minus{} @var{s3n} + @var{qxn}
5391limbs large.
5392@end deftypefun
5393
5394@deftypefn Function mp_limb_t mpn_divrem_1 (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, @w{mp_limb_t *@var{s2p}}, mp_size_t @var{s2n}, mp_limb_t @var{s3limb})
5395@deftypefnx Macro mp_limb_t mpn_divmod_1 (mp_limb_t *@var{r1p}, mp_limb_t *@var{s2p}, @w{mp_size_t @var{s2n}}, @w{mp_limb_t @var{s3limb}})
5396Divide @{@var{s2p}, @var{s2n}@} by @var{s3limb}, and write the quotient at
5397@var{r1p}. Return the remainder.
5398
5399The integer quotient is written to @{@var{r1p}+@var{qxn}, @var{s2n}@} and in
5400addition @var{qxn} fraction limbs are developed and written to @{@var{r1p},
5401@var{qxn}@}. Either or both @var{s2n} and @var{qxn} can be zero. For most
5402usages, @var{qxn} will be zero.
5403
5404@code{mpn_divmod_1} exists for upward source compatibility and is simply a
5405macro calling @code{mpn_divrem_1} with a @var{qxn} of 0.
5406
5407The areas at @var{r1p} and @var{s2p} have to be identical or completely
5408separate, not partially overlapping.
5409@end deftypefn
5410
5411@deftypefun mp_limb_t mpn_divmod (mp_limb_t *@var{r1p}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})
5412[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best
5413performance.]
5414@end deftypefun
5415
5416@deftypefun void mpn_divexact_1 (mp_limb_t * @var{rp}, const mp_limb_t * @var{sp}, mp_size_t @var{n}, mp_limb_t @var{d})
5417Divide @{@var{sp}, @var{n}@} by @var{d}, expecting it to divide exactly, and
5418writing the result to @{@var{rp}, @var{n}@}. If @var{d} doesn't divide
5419exactly, the value written to @{@var{rp}, @var{n}@} is undefined. The areas at
5420@var{rp} and @var{sp} have to be identical or completely separate, not
5421partially overlapping.
5422@end deftypefun
5423
5424@deftypefn Macro mp_limb_t mpn_divexact_by3 (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}})
5425@deftypefnx Function mp_limb_t mpn_divexact_by3c (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}, mp_limb_t @var{carry})
5426Divide @{@var{sp}, @var{n}@} by 3, expecting it to divide exactly, and writing
5427the result to @{@var{rp}, @var{n}@}. If 3 divides exactly, the return value is
5428zero and the result is the quotient. If not, the return value is non-zero and
5429the result won't be anything useful.
5430
5431@code{mpn_divexact_by3c} takes an initial carry parameter, which can be the
5432return value from a previous call, so a large calculation can be done piece by
5433piece from low to high. @code{mpn_divexact_by3} is simply a macro calling
5434@code{mpn_divexact_by3c} with a 0 carry parameter.
5435
5436These routines use a multiply-by-inverse and will be faster than
5437@code{mpn_divrem_1} on CPUs with fast multiplication but slow division.
5438
5439The source @math{a}, result @math{q}, size @math{n}, initial carry @math{i},
5440and return value @math{c} satisfy @m{cb^n+a-i=3q, c*b^n + a-i = 3*q}, where
5441@m{b=2\GMPraise{@code{GMP\_NUMB\_BITS}}, b=2^GMP_NUMB_BITS}. The
5442return @math{c} is always 0, 1 or 2, and the initial carry @math{i} must also
5443be 0, 1 or 2 (these are both borrows really). When @math{c=0} clearly
5444@math{q=(a-i)/3}. When @m{c \neq 0, c!=0}, the remainder @math{(a-i) @bmod{}
54453} is given by @math{3-c}, because @math{b @equiv{} 1 @bmod{} 3} (when
5446@code{mp_bits_per_limb} is even, which is always so currently).
5447@end deftypefn
5448
5449@deftypefun mp_limb_t mpn_mod_1 (const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})
5450Divide @{@var{s1p}, @var{s1n}@} by @var{s2limb}, and return the remainder.
5451@var{s1n} can be zero.
5452@end deftypefun
5453
5454@deftypefun mp_limb_t mpn_lshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
5455Shift @{@var{sp}, @var{n}@} left by @var{count} bits, and write the result to
5456@{@var{rp}, @var{n}@}. The bits shifted out at the left are returned in the
5457least significant @var{count} bits of the return value (the rest of the return
5458value is zero).
5459
5460@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The
5461regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
5462@math{@var{rp} @ge{} @var{sp}}.
5463
5464This function is written in assembly for most CPUs.
5465@end deftypefun
5466
5467@deftypefun mp_limb_t mpn_rshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})
5468Shift @{@var{sp}, @var{n}@} right by @var{count} bits, and write the result to
5469@{@var{rp}, @var{n}@}. The bits shifted out at the right are returned in the
5470most significant @var{count} bits of the return value (the rest of the return
5471value is zero).
5472
5473@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The
5474regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided
5475@math{@var{rp} @le{} @var{sp}}.
5476
5477This function is written in assembly for most CPUs.
5478@end deftypefun
5479
5480@deftypefun int mpn_cmp (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5481Compare @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@} and return a
5482positive value if @math{@var{s1} > @var{s2}}, 0 if they are equal, or a
5483negative value if @math{@var{s1} < @var{s2}}.
5484@end deftypefun
5485
5486@deftypefun int mpn_zero_p (const mp_limb_t *@var{sp}, mp_size_t @var{n})
5487Test @{@var{sp}, @var{n}@} and return 1 if the operand is zero, 0 otherwise.
5488@end deftypefun
5489
5490@deftypefun mp_size_t mpn_gcd (mp_limb_t *@var{rp}, mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t *@var{yp}, mp_size_t @var{yn})
5491Set @{@var{rp}, @var{retval}@} to the greatest common divisor of @{@var{xp},
5492@var{xn}@} and @{@var{yp}, @var{yn}@}. The result can be up to @var{yn} limbs,
5493the return value is the actual number produced. Both source operands are
5494destroyed.
5495
5496It is required that @math{@var{xn} @ge @var{yn} > 0}, the most significant
5497limb of @{@var{yp}, @var{yn}@} must be non-zero, and at least one of
5498the two operands must be odd. No overlap is permitted
5499between @{@var{xp}, @var{xn}@} and @{@var{yp}, @var{yn}@}.
5500@end deftypefun
5501
5502@deftypefun mp_limb_t mpn_gcd_1 (const mp_limb_t *@var{xp}, mp_size_t @var{xn}, mp_limb_t @var{ylimb})
5503Return the greatest common divisor of @{@var{xp}, @var{xn}@} and @var{ylimb}.
5504Both operands must be non-zero.
5505@end deftypefun
5506
5507@deftypefun mp_size_t mpn_gcdext (mp_limb_t *@var{gp}, mp_limb_t *@var{sp}, mp_size_t *@var{sn}, mp_limb_t *@var{up}, mp_size_t @var{un}, mp_limb_t *@var{vp}, mp_size_t @var{vn})
5508Let @m{U,@var{U}} be defined by @{@var{up}, @var{un}@} and let @m{V,@var{V}} be
5509defined by @{@var{vp}, @var{vn}@}.
5510
5511Compute the greatest common divisor @math{G} of @math{U} and @math{V}. Compute
5512a cofactor @math{S} such that @math{G = US + VT}. The second cofactor @var{T}
5513is not computed but can easily be obtained from @m{(G - US) / V, (@var{G} -
5514@var{U}*@var{S}) / @var{V}} (the division will be exact). It is required that
5515@math{@var{un} @ge @var{vn} > 0}, and the most significant
5516limb of @{@var{vp}, @var{vn}@} must be non-zero.
5517
5518@math{S} satisfies @math{S = 1} or @math{@GMPabs{S} < V / (2 G)}. @math{S =
55190} if and only if @math{V} divides @math{U} (i.e., @math{G = V}).
5520
5521Store @math{G} at @var{gp} and let the return value define its limb count.
5522Store @math{S} at @var{sp} and let |*@var{sn}| define its limb count. @math{S}
5523can be negative; when this happens *@var{sn} will be negative. The area at
5524@var{gp} should have room for @var{vn} limbs and the area at @var{sp} should
5525have room for @math{@var{vn}+1} limbs.
5526
5527Both source operands are destroyed.
5528
5529Compatibility notes: GMP 4.3.0 and 4.3.1 defined @math{S} less strictly.
5530Earlier as well as later GMP releases define @math{S} as described here.
5531GMP releases before GMP 4.3.0 required additional space for both input and output
5532areas. More precisely, the areas @{@var{up}, @math{@var{un}+1}@} and
5533@{@var{vp}, @math{@var{vn}+1}@} were destroyed (i.e.@: the operands plus an
5534extra limb past the end of each), and the areas pointed to by @var{gp} and
5535@var{sp} should each have room for @math{@var{un}+1} limbs.
5536@end deftypefun
5537
5538@deftypefun mp_size_t mpn_sqrtrem (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
5539Compute the square root of @{@var{sp}, @var{n}@} and put the result at
5540@{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and the remainder at @{@var{r2p},
5541@var{retval}@}. @var{r2p} needs space for @var{n} limbs, but the return value
5542indicates how many are produced.
5543
5544The most significant limb of @{@var{sp}, @var{n}@} must be non-zero. The
5545areas @{@var{r1p}, @math{@GMPceil{@var{n}/2}}@} and @{@var{sp}, @var{n}@} must
5546be completely separate. The areas @{@var{r2p}, @var{n}@} and @{@var{sp},
5547@var{n}@} must be either identical or completely separate.
5548
5549If the remainder is not wanted then @var{r2p} can be @code{NULL}, and in this
5550case the return value is zero or non-zero according to whether the remainder
5551would have been zero or non-zero.
5552
5553A return value of zero indicates a perfect square. See also
5554@code{mpn_perfect_square_p}.
5555@end deftypefun
5556
5557@deftypefun size_t mpn_sizeinbase (const mp_limb_t *@var{xp}, mp_size_t @var{n}, int @var{base})
5558Return the size of @{@var{xp},@var{n}@} measured in number of digits in the
5559given @var{base}. @var{base} can vary from 2 to 62. Requires @math{@var{n} > 0}
5560and @math{@var{xp}[@var{n}-1] > 0}. The result will be either exact or
55611 too big. If @var{base} is a power of 2, the result is always exact.
5562@end deftypefun
5563
5564@deftypefun mp_size_t mpn_get_str (unsigned char *@var{str}, int @var{base}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n})
5565Convert @{@var{s1p}, @var{s1n}@} to a raw unsigned char array at @var{str} in
5566base @var{base}, and return the number of characters produced. There may be
5567leading zeros in the string. The string is not in ASCII; to convert it to
5568printable format, add the ASCII codes for @samp{0} or @samp{A}, depending on
5569the base and range. @var{base} can vary from 2 to 256.
5570
5571The most significant limb of the input @{@var{s1p}, @var{s1n}@} must be
5572non-zero. The input @{@var{s1p}, @var{s1n}@} is clobbered, except when
5573@var{base} is a power of 2, in which case it's unchanged.
5574
5575The area at @var{str} has to have space for the largest possible number
5576represented by a @var{s1n} long limb array, plus one extra character.
5577@end deftypefun
5578
5579@deftypefun mp_size_t mpn_set_str (mp_limb_t *@var{rp}, const unsigned char *@var{str}, size_t @var{strsize}, int @var{base})
5580Convert bytes @{@var{str},@var{strsize}@} in the given @var{base} to limbs at
5581@var{rp}.
5582
5583@math{@var{str}[0]} is the most significant input byte and
5584@math{@var{str}[@var{strsize}-1]} is the least significant input byte. Each
5585byte should be a value in the range 0 to @math{@var{base}-1}, not an ASCII
5586character. @var{base} can vary from 2 to 256.
5587
5588The converted value is @{@var{rp},@var{rn}@} where @var{rn} is the return
5589value. If the most significant input byte @math{@var{str}[0]} is non-zero,
5590then @math{@var{rp}[@var{rn}-1]} will be non-zero, else
5591@math{@var{rp}[@var{rn}-1]} and some number of subsequent limbs may be zero.
5592
5593The area at @var{rp} has to have space for the largest possible number with
5594@var{strsize} digits in the chosen base, plus one extra limb.
5595
5596The input must have at least one byte, and no overlap is permitted between
5597@{@var{str},@var{strsize}@} and the result at @var{rp}.
5598@end deftypefun
5599
5600@deftypefun {mp_bitcnt_t} mpn_scan0 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit})
5601Scan @var{s1p} from bit position @var{bit} for the next clear bit.
5602
5603It is required that there be a clear bit within the area at @var{s1p} at or
5604beyond bit position @var{bit}, so that the function has something to return.
5605@end deftypefun
5606
5607@deftypefun {mp_bitcnt_t} mpn_scan1 (const mp_limb_t *@var{s1p}, mp_bitcnt_t @var{bit})
5608Scan @var{s1p} from bit position @var{bit} for the next set bit.
5609
5610It is required that there be a set bit within the area at @var{s1p} at or
5611beyond bit position @var{bit}, so that the function has something to return.
5612@end deftypefun
5613
5614@deftypefun void mpn_random (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
5615@deftypefunx void mpn_random2 (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})
5616Generate a random number of length @var{r1n} and store it at @var{r1p}. The
5617most significant limb is always non-zero. @code{mpn_random} generates
5618uniformly distributed limb data, @code{mpn_random2} generates long strings of
5619zeros and ones in the binary representation.
5620
5621@code{mpn_random2} is intended for testing the correctness of the @code{mpn}
5622routines.
5623@end deftypefun
5624
5625@deftypefun {mp_bitcnt_t} mpn_popcount (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
5626Count the number of set bits in @{@var{s1p}, @var{n}@}.
5627@end deftypefun
5628
5629@deftypefun {mp_bitcnt_t} mpn_hamdist (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5630Compute the hamming distance between @{@var{s1p}, @var{n}@} and @{@var{s2p},
5631@var{n}@}, which is the number of bit positions where the two operands have
5632different bit values.
5633@end deftypefun
5634
5635@deftypefun int mpn_perfect_square_p (const mp_limb_t *@var{s1p}, mp_size_t @var{n})
5636Return non-zero iff @{@var{s1p}, @var{n}@} is a perfect square.
5637The most significant limb of the input @{@var{s1p}, @var{n}@} must be
5638non-zero.
5639@end deftypefun
5640
5641@deftypefun void mpn_and_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5642Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p},
5643@var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
5644@end deftypefun
5645
5646@deftypefun void mpn_ior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5647Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and
5648@{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
5649@end deftypefun
5650
5651@deftypefun void mpn_xor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5652Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and
5653@{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
5654@end deftypefun
5655
5656@deftypefun void mpn_andn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5657Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and the bitwise
5658complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
5659@end deftypefun
5660
5661@deftypefun void mpn_iorn_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5662Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and the bitwise
5663complement of @{@var{s2p}, @var{n}@}, and write the result to @{@var{rp}, @var{n}@}.
5664@end deftypefun
5665
5666@deftypefun void mpn_nand_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5667Perform the bitwise logical and of @{@var{s1p}, @var{n}@} and @{@var{s2p},
5668@var{n}@}, and write the bitwise complement of the result to @{@var{rp}, @var{n}@}.
5669@end deftypefun
5670
5671@deftypefun void mpn_nior_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5672Perform the bitwise logical inclusive or of @{@var{s1p}, @var{n}@} and
5673@{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to
5674@{@var{rp}, @var{n}@}.
5675@end deftypefun
5676
5677@deftypefun void mpn_xnor_n (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5678Perform the bitwise logical exclusive or of @{@var{s1p}, @var{n}@} and
5679@{@var{s2p}, @var{n}@}, and write the bitwise complement of the result to
5680@{@var{rp}, @var{n}@}.
5681@end deftypefun
5682
5683@deftypefun void mpn_com (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n})
5684Perform the bitwise complement of @{@var{sp}, @var{n}@}, and write the result
5685to @{@var{rp}, @var{n}@}.
5686@end deftypefun
5687
5688@deftypefun void mpn_copyi (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
5689Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, increasingly.
5690@end deftypefun
5691
5692@deftypefun void mpn_copyd (mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, mp_size_t @var{n})
5693Copy from @{@var{s1p}, @var{n}@} to @{@var{rp}, @var{n}@}, decreasingly.
5694@end deftypefun
5695
5696@deftypefun void mpn_zero (mp_limb_t *@var{rp}, mp_size_t @var{n})
5697Zero @{@var{rp}, @var{n}@}.
5698@end deftypefun
5699
5700@sp 1
5701@section Low-level functions for cryptography
5702@cindex Low-level functions for cryptography
5703@cindex Cryptography functions, low-level
5704
5705The functions prefixed with @code{mpn_sec_} and @code{mpn_cnd_} are designed to
5706perform the exact same low-level operations and have the same cache access
5707patterns for any two same-size arguments, assuming that function arguments are
5708placed at the same position and that the machine state is identical upon
5709function entry. These functions are intended for cryptographic purposes, where
5710resilience to side-channel attacks is desired.
5711
5712These functions are less efficient than their ``leaky'' counterparts; their
5713performance for operands of the sizes typically used for cryptographic
5714applications is between 15% and 100% worse. For larger operands, these
5715functions might be inadequate, since they rely on asymptotically elementary
5716algorithms.
5717
5718These functions do not make any explicit allocations. Those of these functions
5719that need scratch space accept a scratch space operand. This convention allows
5720callers to keep sensitive data in designated memory areas. Note however that
5721compilers may choose to spill scalar values used within these functions to
5722their stack frame and that such scalars may contain sensitive data.
5723
5724In addition to these specially crafted functions, the following @code{mpn}
5725functions are naturally side-channel resistant: @code{mpn_add_n},
5726@code{mpn_sub_n}, @code{mpn_lshift}, @code{mpn_rshift}, @code{mpn_zero},
5727@code{mpn_copyi}, @code{mpn_copyd}, @code{mpn_com}, and the logical function
5728(@code{mpn_and_n}, etc).
5729
5730There are some exceptions from the side-channel resilience: (1) Some assembly
5731implementations of @code{mpn_lshift} identify shift-by-one as a special case.
5732This is a problem iff the shift count is a function of sensitive data. (2)
5733Alpha ev6 and Pentium4 using 64-bit limbs have leaky @code{mpn_add_n} and
5734@code{mpn_sub_n}. (3) Alpha ev6 has a leaky @code{mpn_mul_1} which also makes
5735@code{mpn_sec_mul} on those systems unsafe.
5736
5737@deftypefun mp_limb_t mpn_cnd_add_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5738@deftypefunx mp_limb_t mpn_cnd_sub_n (mp_limb_t @var{cnd}, mp_limb_t *@var{rp}, const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})
5739These functions do conditional addition and subtraction. If @var{cnd} is
5740non-zero, they produce the same result as a regular @code{mpn_add_n} or
5741@code{mpn_sub_n}, and if @var{cnd} is zero, they copy @{@var{s1p},@var{n}@} to
5742the result area and return zero. The functions are designed to have timing and
5743memory access patterns depending only on size and location of the data areas,
5744but independent of the condition @var{cnd}. Like for @code{mpn_add_n} and
5745@code{mpn_sub_n}, on most machines, the timing will also be independent of the
5746actual limb values.
5747@end deftypefun
5748
5749@deftypefun mp_limb_t mpn_sec_add_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp})
5750@deftypefunx mp_limb_t mpn_sec_sub_1 (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{n}, mp_limb_t @var{b}, mp_limb_t *@var{tp})
5751Set @var{R} to @var{A} + @var{b} or @var{A} - @var{b}, respectively, where
5752@var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@}, and @var{b} is
5753a single limb. Returns carry.
5754
5755These functions take @math{O(N)} time, unlike the leaky functions
5756@code{mpn_add_1} which are @math{O(1)} on average. They require scratch space
5757of @code{mpn_sec_add_1_itch(@var{n})} and @code{mpn_sec_sub_1_itch(@var{n})}
5758limbs, respectively, to be passed in the @var{tp} parameter. The scratch space
5759requirements are guaranteed to be at most @var{n} limbs, and increase
5760monotonously in the operand size.
5761@end deftypefun
5762
5763@deftypefun void mpn_cnd_swap (mp_limb_t @var{cnd}, volatile mp_limb_t *@var{ap}, volatile mp_limb_t *@var{bp}, mp_size_t @var{n})
5764If @var{cnd} is non-zero, swaps the contents of the areas @{@var{ap},@var{n}@}
5765and @{@var{bp},@var{n}@}. Otherwise, the areas are left unmodified.
5766Implemented using logical operations on the limbs, with the same memory
5767accesses independent of the value of @var{cnd}.
5768@end deftypefun
5769
5770@deftypefun void mpn_sec_mul (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, mp_limb_t *@var{tp})
5771@deftypefunx mp_size_t mpn_sec_mul_itch (mp_size_t @var{an}, mp_size_t @var{bn})
5772Set @var{R} to @math{A @times{} B}, where @var{A} = @{@var{ap},@var{an}@},
5773@var{B} = @{@var{bp},@var{bn}@}, and @var{R} =
5774@{@var{rp},@math{@var{an}+@var{bn}}@}.
5775
5776It is required that @math{@var{an} @ge @var{bn} > 0}.
5777
5778No overlapping between @var{R} and the input operands is allowed. For
5779@math{@var{A} = @var{B}}, use @code{mpn_sec_sqr} for optimal performance.
5780
5781This function requires scratch space of @code{mpn_sec_mul_itch(@var{an},
5782@var{bn})} limbs to be passed in the @var{tp} parameter. The scratch space
5783requirements are guaranteed to increase monotonously in the operand sizes.
5784@end deftypefun
5785
5786
5787@deftypefun void mpn_sec_sqr (mp_limb_t *@var{rp}, const mp_limb_t *@var{ap}, mp_size_t @var{an}, mp_limb_t *@var{tp})
5788@deftypefunx mp_size_t mpn_sec_sqr_itch (mp_size_t @var{an})
5789Set @var{R} to @math{A^2}, where @var{A} = @{@var{ap},@var{an}@}, and @var{R} =
5790@{@var{rp},@math{2@var{an}}@}.
5791
5792It is required that @math{@var{an} > 0}.
5793
5794No overlapping between @var{R} and the input operands is allowed.
5795
5796This function requires scratch space of @code{mpn_sec_sqr_itch(@var{an})} limbs
5797to be passed in the @var{tp} parameter. The scratch space requirements are
5798guaranteed to increase monotonously in the operand size.
5799@end deftypefun
5800
5801
5802@deftypefun void mpn_sec_powm (mp_limb_t *@var{rp}, const mp_limb_t *@var{bp}, mp_size_t @var{bn}, const mp_limb_t *@var{ep}, mp_bitcnt_t @var{enb}, const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_limb_t *@var{tp})
5803@deftypefunx mp_size_t mpn_sec_powm_itch (mp_size_t @var{bn}, mp_bitcnt_t @var{enb}, size_t @var{n})
5804Set @var{R} to @m{B^E \bmod @var{M}, (@var{B} raised to @var{E}) modulo
5805@var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{M} = @{@var{mp},@var{n}@},
5806and @var{E} = @{@var{ep},@math{@GMPceil{@var{enb} /
5807@code{GMP\_NUMB\_BITS}}}@}.
5808
5809It is required that @math{@var{B} > 0}, that @math{@var{M} > 0} is odd, and
5810that @m{@var{E} < 2@GMPraise{@var{enb}}, @var{E} < 2^@var{enb}}, with @math{@var{enb} > 0}.
5811
5812No overlapping between @var{R} and the input operands is allowed.
5813
5814This function requires scratch space of @code{mpn_sec_powm_itch(@var{bn},
5815@var{enb}, @var{n})} limbs to be passed in the @var{tp} parameter. The scratch
5816space requirements are guaranteed to increase monotonously in the operand
5817sizes.
5818@end deftypefun
5819
5820@deftypefun void mpn_sec_tabselect (mp_limb_t *@var{rp}, const mp_limb_t *@var{tab}, mp_size_t @var{n}, mp_size_t @var{nents}, mp_size_t @var{which})
5821Select entry @var{which} from table @var{tab}, which has @var{nents} entries, each @var{n}
5822limbs. Store the selected entry at @var{rp}.
5823
5824This function reads the entire table to avoid side-channel information leaks.
5825@end deftypefun
5826
5827@deftypefun mp_limb_t mpn_sec_div_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp})
5828@deftypefunx mp_size_t mpn_sec_div_qr_itch (mp_size_t @var{nn}, mp_size_t @var{dn})
5829
5830Set @var{Q} to @m{\lfloor @var{N} / @var{D}\rfloor, the truncated quotient
5831@var{N} / @var{D}} and @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo
5832@var{D}}, where @var{N} = @{@var{np},@var{nn}@}, @var{D} =
5833@{@var{dp},@var{dn}@}, @var{Q}'s most significant limb is the function return
5834value and the remaining limbs are @{@var{qp},@var{nn-dn}@}, and @var{R} =
5835@{@var{np},@var{dn}@}.
5836
5837It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that
5838@m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}. This does not
5839imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded.
5840
5841Note the overlapping between @var{N} and @var{R}. No other operand overlapping
5842is allowed. The entire space occupied by @var{N} is overwritten.
5843
5844This function requires scratch space of @code{mpn_sec_div_qr_itch(@var{nn},
5845@var{dn})} limbs to be passed in the @var{tp} parameter.
5846@end deftypefun
5847
5848@deftypefun void mpn_sec_div_r (mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn}, mp_limb_t *@var{tp})
5849@deftypefunx mp_size_t mpn_sec_div_r_itch (mp_size_t @var{nn}, mp_size_t @var{dn})
5850
5851Set @var{R} to @m{@var{N} \bmod @var{D}, @var{N} modulo @var{D}}, where @var{N}
5852= @{@var{np},@var{nn}@}, @var{D} = @{@var{dp},@var{dn}@}, and @var{R} =
5853@{@var{np},@var{dn}@}.
5854
5855It is required that @math{@var{nn} @ge @var{dn} @ge 1}, and that
5856@m{@var{dp}[@var{dn}-1] @neq 0, @var{dp}[@var{dn}-1] != 0}. This does not
5857imply that @math{@var{N} @ge @var{D}} since @var{N} might be zero-padded.
5858
5859Note the overlapping between @var{N} and @var{R}. No other operand overlapping
5860is allowed. The entire space occupied by @var{N} is overwritten.
5861
5862This function requires scratch space of @code{mpn_sec_div_r_itch(@var{nn},
5863@var{dn})} limbs to be passed in the @var{tp} parameter.
5864@end deftypefun
5865
5866@deftypefun int mpn_sec_invert (mp_limb_t *@var{rp}, mp_limb_t *@var{ap}, const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_bitcnt_t @var{nbcnt}, mp_limb_t *@var{tp})
5867@deftypefunx mp_size_t mpn_sec_invert_itch (mp_size_t @var{n})
5868Set @var{R} to @m{@var{A}^{-1} \bmod @var{M}, the inverse of @var{A} modulo
5869@var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{A} = @{@var{ap},@var{n}@},
5870and @var{M} = @{@var{mp},@var{n}@}. @strong{This function's interface is
5871preliminary.}
5872
5873If an inverse exists, return 1, otherwise return 0 and leave @var{R}
5874undefined. In either case, the input @var{A} is destroyed.
5875
5876It is required that @var{M} is odd, and that @math{@var{nbcnt} @ge
5877@GMPceil{\log(@var{A}+1)} + @GMPceil{\log(@var{M}+1)}}. A safe choice is
5878@m{@var{nbcnt} = 2@var{n} @times{} @code{GMP\_NUMB\_BITS}, @var{nbcnt} = 2
5879@times{} @var{n} @times{} GMP_NUMB_BITS}, but a smaller value might improve
5880performance if @var{M} or @var{A} are known to have leading zero bits.
5881
5882This function requires scratch space of @code{mpn_sec_invert_itch(@var{n})}
5883limbs to be passed in the @var{tp} parameter.
5884@end deftypefun
5885
5886
5887@sp 1
5888@section Nails
5889@cindex Nails
5890
5891@strong{Everything in this section is highly experimental and may disappear or
5892be subject to incompatible changes in a future version of GMP.}
5893
5894Nails are an experimental feature whereby a few bits are left unused at the
5895top of each @code{mp_limb_t}. This can significantly improve carry handling
5896on some processors.
5897
5898All the @code{mpn} functions accepting limb data will expect the nail bits to
5899be zero on entry, and will return data with the nails similarly all zero.
5900This applies both to limb vectors and to single limb arguments.
5901
5902Nails can be enabled by configuring with @samp{--enable-nails}. By default
5903the number of bits will be chosen according to what suits the host processor,
5904but a particular number can be selected with @samp{--enable-nails=N}.
5905
5906At the mpn level, a nail build is neither source nor binary compatible with a
5907non-nail build, strictly speaking. But programs acting on limbs only through
5908the mpn functions are likely to work equally well with either build, and
5909judicious use of the definitions below should make any program compatible with
5910either build, at the source level.
5911
5912For the higher level routines, meaning @code{mpz} etc, a nail build should be
5913fully source and binary compatible with a non-nail build.
5914
5915@defmac GMP_NAIL_BITS
5916@defmacx GMP_NUMB_BITS
5917@defmacx GMP_LIMB_BITS
5918@code{GMP_NAIL_BITS} is the number of nail bits, or 0 when nails are not in
5919use. @code{GMP_NUMB_BITS} is the number of data bits in a limb.
5920@code{GMP_LIMB_BITS} is the total number of bits in an @code{mp_limb_t}. In
5921all cases
5922
5923@example
5924GMP_LIMB_BITS == GMP_NAIL_BITS + GMP_NUMB_BITS
5925@end example
5926@end defmac
5927
5928@defmac GMP_NAIL_MASK
5929@defmacx GMP_NUMB_MASK
5930Bit masks for the nail and number parts of a limb. @code{GMP_NAIL_MASK} is 0
5931when nails are not in use.
5932
5933@code{GMP_NAIL_MASK} is not often needed, since the nail part can be obtained
5934with @code{x >> GMP_NUMB_BITS}, and that means one less large constant, which
5935can help various RISC chips.
5936@end defmac
5937
5938@defmac GMP_NUMB_MAX
5939The maximum value that can be stored in the number part of a limb. This is
5940the same as @code{GMP_NUMB_MASK}, but can be used for clarity when doing
5941comparisons rather than bit-wise operations.
5942@end defmac
5943
5944The term ``nails'' comes from finger or toe nails, which are at the ends of a
5945limb (arm or leg). ``numb'' is short for number, but is also how the
5946developers felt after trying for a long time to come up with sensible names
5947for these things.
5948
5949In the future (the distant future most likely) a non-zero nail might be
5950permitted, giving non-unique representations for numbers in a limb vector.
5951This would help vector processors since carries would only ever need to
5952propagate one or two limbs.
5953
5954
5955@node Random Number Functions, Formatted Output, Low-level Functions, Top
5956@chapter Random Number Functions
5957@cindex Random number functions
5958
5959Sequences of pseudo-random numbers in GMP are generated using a variable of
5960type @code{gmp_randstate_t}, which holds an algorithm selection and a current
5961state. Such a variable must be initialized by a call to one of the
5962@code{gmp_randinit} functions, and can be seeded with one of the
5963@code{gmp_randseed} functions.
5964
5965The functions actually generating random numbers are described in @ref{Integer
5966Random Numbers}, and @ref{Miscellaneous Float Functions}.
5967
5968The older style random number functions don't accept a @code{gmp_randstate_t}
5969parameter but instead share a global variable of that type. They use a
5970default algorithm and are currently not seeded (though perhaps that will
5971change in the future). The new functions accepting a @code{gmp_randstate_t}
5972are recommended for applications that care about randomness.
5973
5974@menu
5975* Random State Initialization::
5976* Random State Seeding::
5977* Random State Miscellaneous::
5978@end menu
5979
5980@node Random State Initialization, Random State Seeding, Random Number Functions, Random Number Functions
5981@section Random State Initialization
5982@cindex Random number state
5983@cindex Initialization functions
5984
5985@deftypefun void gmp_randinit_default (gmp_randstate_t @var{state})
5986Initialize @var{state} with a default algorithm. This will be a compromise
5987between speed and randomness, and is recommended for applications with no
5988special requirements. Currently this is @code{gmp_randinit_mt}.
5989@end deftypefun
5990
5991@deftypefun void gmp_randinit_mt (gmp_randstate_t @var{state})
5992@cindex Mersenne twister random numbers
5993Initialize @var{state} for a Mersenne Twister algorithm. This algorithm is
5994fast and has good randomness properties.
5995@end deftypefun
5996
5997@deftypefun void gmp_randinit_lc_2exp (gmp_randstate_t @var{state}, const mpz_t @var{a}, @w{unsigned long @var{c}}, @w{mp_bitcnt_t @var{m2exp}})
5998@cindex Linear congruential random numbers
5999Initialize @var{state} with a linear congruential algorithm @m{X = (@var{a}X +
6000@var{c}) @bmod 2^{m2exp}, X = (@var{a}*X + @var{c}) mod 2^@var{m2exp}}.
6001
6002The low bits of @math{X} in this algorithm are not very random. The least
6003significant bit will have a period no more than 2, and the second bit no more
6004than 4, etc. For this reason only the high half of each @math{X} is actually
6005used.
6006
6007When a random number of more than @math{@var{m2exp}/2} bits is to be
6008generated, multiple iterations of the recurrence are used and the results
6009concatenated.
6010@end deftypefun
6011
6012@deftypefun int gmp_randinit_lc_2exp_size (gmp_randstate_t @var{state}, mp_bitcnt_t @var{size})
6013@cindex Linear congruential random numbers
6014Initialize @var{state} for a linear congruential algorithm as per
6015@code{gmp_randinit_lc_2exp}. @var{a}, @var{c} and @var{m2exp} are selected
6016from a table, chosen so that @var{size} bits (or more) of each @math{X} will
6017be used, i.e.@: @math{@var{m2exp}/2 @ge{} @var{size}}.
6018
6019If successful the return value is non-zero. If @var{size} is bigger than the
6020table data provides then the return value is zero. The maximum @var{size}
6021currently supported is 128.
6022@end deftypefun
6023
6024@deftypefun void gmp_randinit_set (gmp_randstate_t @var{rop}, gmp_randstate_t @var{op})
6025Initialize @var{rop} with a copy of the algorithm and state from @var{op}.
6026@end deftypefun
6027
6028@c Although gmp_randinit, gmp_errno and related constants are obsolete, we
6029@c still put @findex entries for them, since they're still documented and
6030@c someone might be looking them up when perusing old application code.
6031
6032@deftypefun void gmp_randinit (gmp_randstate_t @var{state}, @w{gmp_randalg_t @var{alg}}, @dots{})
6033@strong{This function is obsolete.}
6034
6035@findex GMP_RAND_ALG_LC
6036@findex GMP_RAND_ALG_DEFAULT
6037Initialize @var{state} with an algorithm selected by @var{alg}. The only
6038choice is @code{GMP_RAND_ALG_LC}, which is @code{gmp_randinit_lc_2exp_size}
6039described above. A third parameter of type @code{unsigned long} is required,
6040this is the @var{size} for that function. @code{GMP_RAND_ALG_DEFAULT} or 0
6041are the same as @code{GMP_RAND_ALG_LC}.
6042
6043@c For reference, this is the only place gmp_errno has been documented, and
6044@c due to being non thread safe we won't be adding to it's uses.
6045@findex gmp_errno
6046@findex GMP_ERROR_UNSUPPORTED_ARGUMENT
6047@findex GMP_ERROR_INVALID_ARGUMENT
6048@code{gmp_randinit} sets bits in the global variable @code{gmp_errno} to
6049indicate an error. @code{GMP_ERROR_UNSUPPORTED_ARGUMENT} if @var{alg} is
6050unsupported, or @code{GMP_ERROR_INVALID_ARGUMENT} if the @var{size} parameter
6051is too big. It may be noted this error reporting is not thread safe (a good
6052reason to use @code{gmp_randinit_lc_2exp_size} instead).
6053@end deftypefun
6054
6055@deftypefun void gmp_randclear (gmp_randstate_t @var{state})
6056Free all memory occupied by @var{state}.
6057@end deftypefun
6058
6059
6060@node Random State Seeding, Random State Miscellaneous, Random State Initialization, Random Number Functions
6061@section Random State Seeding
6062@cindex Random number seeding
6063@cindex Seeding random numbers
6064
6065@deftypefun void gmp_randseed (gmp_randstate_t @var{state}, const mpz_t @var{seed})
6066@deftypefunx void gmp_randseed_ui (gmp_randstate_t @var{state}, @w{unsigned long int @var{seed}})
6067Set an initial seed value into @var{state}.
6068
6069The size of a seed determines how many different sequences of random numbers
6070that it's possible to generate. The ``quality'' of the seed is the randomness
6071of a given seed compared to the previous seed used, and this affects the
6072randomness of separate number sequences. The method for choosing a seed is
6073critical if the generated numbers are to be used for important applications,
6074such as generating cryptographic keys.
6075
6076Traditionally the system time has been used to seed, but care needs to be
6077taken with this. If an application seeds often and the resolution of the
6078system clock is low, then the same sequence of numbers might be repeated.
6079Also, the system time is quite easy to guess, so if unpredictability is
6080required then it should definitely not be the only source for the seed value.
6081On some systems there's a special device @file{/dev/random} which provides
6082random data better suited for use as a seed.
6083@end deftypefun
6084
6085
6086@node Random State Miscellaneous, , Random State Seeding, Random Number Functions
6087@section Random State Miscellaneous
6088
6089@deftypefun {unsigned long} gmp_urandomb_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
6090Return a uniformly distributed random number of @var{n} bits, i.e.@: in the
6091range 0 to @m{2^n-1,2^@var{n}-1} inclusive. @var{n} must be less than or
6092equal to the number of bits in an @code{unsigned long}.
6093@end deftypefun
6094
6095@deftypefun {unsigned long} gmp_urandomm_ui (gmp_randstate_t @var{state}, unsigned long @var{n})
6096Return a uniformly distributed random number in the range 0 to
6097@math{@var{n}-1}, inclusive.
6098@end deftypefun
6099
6100
6101@node Formatted Output, Formatted Input, Random Number Functions, Top
6102@chapter Formatted Output
6103@cindex Formatted output
6104@cindex @code{printf} formatted output
6105
6106@menu
6107* Formatted Output Strings::
6108* Formatted Output Functions::
6109* C++ Formatted Output::
6110@end menu
6111
6112@node Formatted Output Strings, Formatted Output Functions, Formatted Output, Formatted Output
6113@section Format Strings
6114
6115@code{gmp_printf} and friends accept format strings similar to the standard C
6116@code{printf} (@pxref{Formatted Output,, Formatted Output, libc, The GNU C
6117Library Reference Manual}). A format specification is of the form
6118
6119@example
6120% [flags] [width] [.[precision]] [type] conv
6121@end example
6122
6123GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
6124and @code{mpf_t} respectively, @samp{M} for @code{mp_limb_t}, and @samp{N} for
6125an @code{mp_limb_t} array. @samp{Z}, @samp{Q}, @samp{M} and @samp{N} behave
6126like integers. @samp{Q} will print a @samp{/} and a denominator, if needed.
6127@samp{F} behaves like a float. For example,
6128
6129@example
6130mpz_t z;
6131gmp_printf ("%s is an mpz %Zd\n", "here", z);
6132
6133mpq_t q;
6134gmp_printf ("a hex rational: %#40Qx\n", q);
6135
6136mpf_t f;
6137int n;
6138gmp_printf ("fixed point mpf %.*Ff with %d digits\n", n, f, n);
6139
6140mp_limb_t l;
6141gmp_printf ("limb %Mu\n", l);
6142
6143const mp_limb_t *ptr;
6144mp_size_t size;
6145gmp_printf ("limb array %Nx\n", ptr, size);
6146@end example
6147
6148For @samp{N} the limbs are expected least significant first, as per the
6149@code{mpn} functions (@pxref{Low-level Functions}). A negative size can be
6150given to print the value as a negative.
6151
6152All the standard C @code{printf} types behave the same as the C library
6153@code{printf}, and can be freely intermixed with the GMP extensions. In the
6154current implementation the standard parts of the format string are simply
6155handed to @code{printf} and only the GMP extensions handled directly.
6156
6157The flags accepted are as follows. GLIBC style @nisamp{'} is only for the
6158standard C types (not the GMP types), and only if the C library supports it.
6159
6160@quotation
6161@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6162@item @nicode{0} @tab pad with zeros (rather than spaces)
6163@item @nicode{#} @tab show the base with @samp{0x}, @samp{0X} or @samp{0}
6164@item @nicode{+} @tab always show a sign
6165@item (space) @tab show a space or a @samp{-} sign
6166@item @nicode{'} @tab group digits, GLIBC style (not GMP types)
6167@end multitable
6168@end quotation
6169
6170The optional width and precision can be given as a number within the format
6171string, or as a @samp{*} to take an extra parameter of type @code{int}, the
6172same as the standard @code{printf}.
6173
6174The standard types accepted are as follows. @samp{h} and @samp{l} are
6175portable, the rest will depend on the compiler (or include files) for the type
6176and the C library for the output.
6177
6178@quotation
6179@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6180@item @nicode{h} @tab @nicode{short}
6181@item @nicode{hh} @tab @nicode{char}
6182@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}
6183@item @nicode{l} @tab @nicode{long} or @nicode{wchar_t}
6184@item @nicode{ll} @tab @nicode{long long}
6185@item @nicode{L} @tab @nicode{long double}
6186@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}
6187@item @nicode{t} @tab @nicode{ptrdiff_t}
6188@item @nicode{z} @tab @nicode{size_t}
6189@end multitable
6190@end quotation
6191
6192@noindent
6193The GMP types are
6194
6195@quotation
6196@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6197@item @nicode{F} @tab @nicode{mpf_t}, float conversions
6198@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions
6199@item @nicode{M} @tab @nicode{mp_limb_t}, integer conversions
6200@item @nicode{N} @tab @nicode{mp_limb_t} array, integer conversions
6201@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions
6202@end multitable
6203@end quotation
6204
6205The conversions accepted are as follows. @samp{a} and @samp{A} are always
6206supported for @code{mpf_t} but depend on the C library for standard C float
6207types. @samp{m} and @samp{p} depend on the C library.
6208
6209@quotation
6210@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6211@item @nicode{a} @nicode{A} @tab hex floats, C99 style
6212@item @nicode{c} @tab character
6213@item @nicode{d} @tab decimal integer
6214@item @nicode{e} @nicode{E} @tab scientific format float
6215@item @nicode{f} @tab fixed point float
6216@item @nicode{i} @tab same as @nicode{d}
6217@item @nicode{g} @nicode{G} @tab fixed or scientific float
6218@item @nicode{m} @tab @code{strerror} string, GLIBC style
6219@item @nicode{n} @tab store characters written so far
6220@item @nicode{o} @tab octal integer
6221@item @nicode{p} @tab pointer
6222@item @nicode{s} @tab string
6223@item @nicode{u} @tab unsigned integer
6224@item @nicode{x} @nicode{X} @tab hex integer
6225@end multitable
6226@end quotation
6227
6228@samp{o}, @samp{x} and @samp{X} are unsigned for the standard C types, but for
6229types @samp{Z}, @samp{Q} and @samp{N} they are signed. @samp{u} is not
6230meaningful for @samp{Z}, @samp{Q} and @samp{N}.
6231
6232@samp{M} is a proxy for the C library @samp{l} or @samp{L}, according to the
6233size of @code{mp_limb_t}. Unsigned conversions will be usual, but a signed
6234conversion can be used and will interpret the value as a twos complement
6235negative.
6236
6237@samp{n} can be used with any type, even the GMP types.
6238
6239Other types or conversions that might be accepted by the C library
6240@code{printf} cannot be used through @code{gmp_printf}, this includes for
6241instance extensions registered with GLIBC @code{register_printf_function}.
6242Also currently there's no support for POSIX @samp{$} style numbered arguments
6243(perhaps this will be added in the future).
6244
6245The precision field has its usual meaning for integer @samp{Z} and float
6246@samp{F} types, but is currently undefined for @samp{Q} and should not be used
6247with that.
6248
6249@code{mpf_t} conversions only ever generate as many digits as can be
6250accurately represented by the operand, the same as @code{mpf_get_str} does.
6251Zeros will be used if necessary to pad to the requested precision. This
6252happens even for an @samp{f} conversion of an @code{mpf_t} which is an
6253integer, for instance @math{2^@W{1024}} in an @code{mpf_t} of 128 bits
6254precision will only produce about 40 digits, then pad with zeros to the
6255decimal point. An empty precision field like @samp{%.Fe} or @samp{%.Ff} can
6256be used to specifically request just the significant digits. Without any dot
6257and thus no precision field, a precision value of 6 will be used. Note that
6258these rules mean that @samp{%Ff}, @samp{%.Ff}, and @samp{%.0Ff} will all be
6259different.
6260
6261The decimal point character (or string) is taken from the current locale
6262settings on systems which provide @code{localeconv} (@pxref{Locales,, Locales
6263and Internationalization, libc, The GNU C Library Reference Manual}). The C
6264library will normally do the same for standard float output.
6265
6266The format string is only interpreted as plain @code{char}s, multibyte
6267characters are not recognised. Perhaps this will change in the future.
6268
6269
6270@node Formatted Output Functions, C++ Formatted Output, Formatted Output Strings, Formatted Output
6271@section Functions
6272@cindex Output functions
6273
6274Each of the following functions is similar to the corresponding C library
6275function. The basic @code{printf} forms take a variable argument list. The
6276@code{vprintf} forms take an argument pointer, see @ref{Variadic Functions,,
6277Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
6278va_start}.
6279
6280It should be emphasised that if a format string is invalid, or the arguments
6281don't match what the format specifies, then the behaviour of any of these
6282functions will be unpredictable. GCC format string checking is not available,
6283since it doesn't recognise the GMP extensions.
6284
6285The file based functions @code{gmp_printf} and @code{gmp_fprintf} will return
6286@math{-1} to indicate a write error. Output is not ``atomic'', so partial
6287output may be produced if a write error occurs. All the functions can return
6288@math{-1} if the C library @code{printf} variant in use returns @math{-1}, but
6289this shouldn't normally occur.
6290
6291@deftypefun int gmp_printf (const char *@var{fmt}, @dots{})
6292@deftypefunx int gmp_vprintf (const char *@var{fmt}, va_list @var{ap})
6293Print to the standard output @code{stdout}. Return the number of characters
6294written, or @math{-1} if an error occurred.
6295@end deftypefun
6296
6297@deftypefun int gmp_fprintf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
6298@deftypefunx int gmp_vfprintf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
6299Print to the stream @var{fp}. Return the number of characters written, or
6300@math{-1} if an error occurred.
6301@end deftypefun
6302
6303@deftypefun int gmp_sprintf (char *@var{buf}, const char *@var{fmt}, @dots{})
6304@deftypefunx int gmp_vsprintf (char *@var{buf}, const char *@var{fmt}, va_list @var{ap})
6305Form a null-terminated string in @var{buf}. Return the number of characters
6306written, excluding the terminating null.
6307
6308No overlap is permitted between the space at @var{buf} and the string
6309@var{fmt}.
6310
6311These functions are not recommended, since there's no protection against
6312exceeding the space available at @var{buf}.
6313@end deftypefun
6314
6315@deftypefun int gmp_snprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, @dots{})
6316@deftypefunx int gmp_vsnprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, va_list @var{ap})
6317Form a null-terminated string in @var{buf}. No more than @var{size} bytes
6318will be written. To get the full output, @var{size} must be enough for the
6319string and null-terminator.
6320
6321The return value is the total number of characters which ought to have been
6322produced, excluding the terminating null. If @math{@var{retval} @ge{}
6323@var{size}} then the actual output has been truncated to the first
6324@math{@var{size}-1} characters, and a null appended.
6325
6326No overlap is permitted between the region @{@var{buf},@var{size}@} and the
6327@var{fmt} string.
6328
6329Notice the return value is in ISO C99 @code{snprintf} style. This is so even
6330if the C library @code{vsnprintf} is the older GLIBC 2.0.x style.
6331@end deftypefun
6332
6333@deftypefun int gmp_asprintf (char **@var{pp}, const char *@var{fmt}, @dots{})
6334@deftypefunx int gmp_vasprintf (char **@var{pp}, const char *@var{fmt}, va_list @var{ap})
6335Form a null-terminated string in a block of memory obtained from the current
6336memory allocation function (@pxref{Custom Allocation}). The block will be the
6337size of the string and null-terminator. The address of the block in stored to
6338*@var{pp}. The return value is the number of characters produced, excluding
6339the null-terminator.
6340
6341Unlike the C library @code{asprintf}, @code{gmp_asprintf} doesn't return
6342@math{-1} if there's no more memory available, it lets the current allocation
6343function handle that.
6344@end deftypefun
6345
6346@deftypefun int gmp_obstack_printf (struct obstack *@var{ob}, const char *@var{fmt}, @dots{})
6347@deftypefunx int gmp_obstack_vprintf (struct obstack *@var{ob}, const char *@var{fmt}, va_list @var{ap})
6348@cindex @code{obstack} output
6349Append to the current object in @var{ob}. The return value is the number of
6350characters written. A null-terminator is not written.
6351
6352@var{fmt} cannot be within the current object in @var{ob}, since that object
6353might move as it grows.
6354
6355These functions are available only when the C library provides the obstack
6356feature, which probably means only on GNU systems, see @ref{Obstacks,,
6357Obstacks, libc, The GNU C Library Reference Manual}.
6358@end deftypefun
6359
6360
6361@node C++ Formatted Output, , Formatted Output Functions, Formatted Output
6362@section C++ Formatted Output
6363@cindex C++ @code{ostream} output
6364@cindex @code{ostream} output
6365
6366The following functions are provided in @file{libgmpxx} (@pxref{Headers and
6367Libraries}), which is built if C++ support is enabled (@pxref{Build Options}).
6368Prototypes are available from @code{<gmp.h>}.
6369
6370@deftypefun ostream& operator<< (ostream& @var{stream}, const mpz_t @var{op})
6371Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
6372@code{ios::width} is reset to 0 after output, the same as the standard
6373@code{ostream operator<<} routines do.
6374
6375In hex or octal, @var{op} is printed as a signed number, the same as for
6376decimal. This is unlike the standard @code{operator<<} routines on @code{int}
6377etc, which instead give twos complement.
6378@end deftypefun
6379
6380@deftypefun ostream& operator<< (ostream& @var{stream}, const mpq_t @var{op})
6381Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
6382@code{ios::width} is reset to 0 after output, the same as the standard
6383@code{ostream operator<<} routines do.
6384
6385Output will be a fraction like @samp{5/9}, or if the denominator is 1 then
6386just a plain integer like @samp{123}.
6387
6388In hex or octal, @var{op} is printed as a signed value, the same as for
6389decimal. If @code{ios::showbase} is set then a base indicator is shown on
6390both the numerator and denominator (if the denominator is required).
6391@end deftypefun
6392
6393@deftypefun ostream& operator<< (ostream& @var{stream}, const mpf_t @var{op})
6394Print @var{op} to @var{stream}, using its @code{ios} formatting settings.
6395@code{ios::width} is reset to 0 after output, the same as the standard
6396@code{ostream operator<<} routines do.
6397
6398The decimal point follows the standard library float @code{operator<<}, which
6399on recent systems means the @code{std::locale} imbued on @var{stream}.
6400
6401Hex and octal are supported, unlike the standard @code{operator<<} on
6402@code{double}. The mantissa will be in hex or octal, the exponent will be in
6403decimal. For hex the exponent delimiter is an @samp{@@}. This is as per
6404@code{mpf_out_str}.
6405
6406@code{ios::showbase} is supported, and will put a base on the mantissa, for
6407example hex @samp{0x1.8} or @samp{0x0.8}, or octal @samp{01.4} or @samp{00.4}.
6408This last form is slightly strange, but at least differentiates itself from
6409decimal.
6410@end deftypefun
6411
6412These operators mean that GMP types can be printed in the usual C++ way, for
6413example,
6414
6415@example
6416mpz_t z;
6417int n;
6418...
6419cout << "iteration " << n << " value " << z << "\n";
6420@end example
6421
6422But note that @code{ostream} output (and @code{istream} input, @pxref{C++
6423Formatted Input}) is the only overloading available for the GMP types and that
6424for instance using @code{+} with an @code{mpz_t} will have unpredictable
6425results. For classes with overloading, see @ref{C++ Class Interface}.
6426
6427
6428@node Formatted Input, C++ Class Interface, Formatted Output, Top
6429@chapter Formatted Input
6430@cindex Formatted input
6431@cindex @code{scanf} formatted input
6432
6433@menu
6434* Formatted Input Strings::
6435* Formatted Input Functions::
6436* C++ Formatted Input::
6437@end menu
6438
6439
6440@node Formatted Input Strings, Formatted Input Functions, Formatted Input, Formatted Input
6441@section Formatted Input Strings
6442
6443@code{gmp_scanf} and friends accept format strings similar to the standard C
6444@code{scanf} (@pxref{Formatted Input,, Formatted Input, libc, The GNU C
6445Library Reference Manual}). A format specification is of the form
6446
6447@example
6448% [flags] [width] [type] conv
6449@end example
6450
6451GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}
6452and @code{mpf_t} respectively. @samp{Z} and @samp{Q} behave like integers.
6453@samp{Q} will read a @samp{/} and a denominator, if present. @samp{F} behaves
6454like a float.
6455
6456GMP variables don't require an @code{&} when passed to @code{gmp_scanf}, since
6457they're already ``call-by-reference''. For example,
6458
6459@example
6460/* to read say "a(5) = 1234" */
6461int n;
6462mpz_t z;
6463gmp_scanf ("a(%d) = %Zd\n", &n, z);
6464
6465mpq_t q1, q2;
6466gmp_sscanf ("0377 + 0x10/0x11", "%Qi + %Qi", q1, q2);
6467
6468/* to read say "topleft (1.55,-2.66)" */
6469mpf_t x, y;
6470char buf[32];
6471gmp_scanf ("%31s (%Ff,%Ff)", buf, x, y);
6472@end example
6473
6474All the standard C @code{scanf} types behave the same as in the C library
6475@code{scanf}, and can be freely intermixed with the GMP extensions. In the
6476current implementation the standard parts of the format string are simply
6477handed to @code{scanf} and only the GMP extensions handled directly.
6478
6479The flags accepted are as follows. @samp{a} and @samp{'} will depend on
6480support from the C library, and @samp{'} cannot be used with GMP types.
6481
6482@quotation
6483@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6484@item @nicode{*} @tab read but don't store
6485@item @nicode{a} @tab allocate a buffer (string conversions)
6486@item @nicode{'} @tab grouped digits, GLIBC style (not GMP types)
6487@end multitable
6488@end quotation
6489
6490The standard types accepted are as follows. @samp{h} and @samp{l} are
6491portable, the rest will depend on the compiler (or include files) for the type
6492and the C library for the input.
6493
6494@quotation
6495@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6496@item @nicode{h} @tab @nicode{short}
6497@item @nicode{hh} @tab @nicode{char}
6498@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}
6499@item @nicode{l} @tab @nicode{long int}, @nicode{double} or @nicode{wchar_t}
6500@item @nicode{ll} @tab @nicode{long long}
6501@item @nicode{L} @tab @nicode{long double}
6502@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}
6503@item @nicode{t} @tab @nicode{ptrdiff_t}
6504@item @nicode{z} @tab @nicode{size_t}
6505@end multitable
6506@end quotation
6507
6508@noindent
6509The GMP types are
6510
6511@quotation
6512@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6513@item @nicode{F} @tab @nicode{mpf_t}, float conversions
6514@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions
6515@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions
6516@end multitable
6517@end quotation
6518
6519The conversions accepted are as follows. @samp{p} and @samp{[} will depend on
6520support from the C library, the rest are standard.
6521
6522@quotation
6523@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
6524@item @nicode{c} @tab character or characters
6525@item @nicode{d} @tab decimal integer
6526@item @nicode{e} @nicode{E} @nicode{f} @nicode{g} @nicode{G}
6527 @tab float
6528@item @nicode{i} @tab integer with base indicator
6529@item @nicode{n} @tab characters read so far
6530@item @nicode{o} @tab octal integer
6531@item @nicode{p} @tab pointer
6532@item @nicode{s} @tab string of non-whitespace characters
6533@item @nicode{u} @tab decimal integer
6534@item @nicode{x} @nicode{X} @tab hex integer
6535@item @nicode{[} @tab string of characters in a set
6536@end multitable
6537@end quotation
6538
6539@samp{e}, @samp{E}, @samp{f}, @samp{g} and @samp{G} are identical, they all
6540read either fixed point or scientific format, and either upper or lower case
6541@samp{e} for the exponent in scientific format.
6542
6543C99 style hex float format (@code{printf %a}, @pxref{Formatted Output
6544Strings}) is always accepted for @code{mpf_t}, but for the standard float
6545types it will depend on the C library.
6546
6547@samp{x} and @samp{X} are identical, both accept both upper and lower case
6548hexadecimal.
6549
6550@samp{o}, @samp{u}, @samp{x} and @samp{X} all read positive or negative
6551values. For the standard C types these are described as ``unsigned''
6552conversions, but that merely affects certain overflow handling, negatives are
6553still allowed (per @code{strtoul}, @pxref{Parsing of Integers,, Parsing of
6554Integers, libc, The GNU C Library Reference Manual}). For GMP types there are
6555no overflows, so @samp{d} and @samp{u} are identical.
6556
6557@samp{Q} type reads the numerator and (optional) denominator as given. If the
6558value might not be in canonical form then @code{mpq_canonicalize} must be
6559called before using it in any calculations (@pxref{Rational Number
6560Functions}).
6561
6562@samp{Qi} will read a base specification separately for the numerator and
6563denominator. For example @samp{0x10/11} would be 16/11, whereas
6564@samp{0x10/0x11} would be 16/17.
6565
6566@samp{n} can be used with any of the types above, even the GMP types.
6567@samp{*} to suppress assignment is allowed, though in that case it would do
6568nothing at all.
6569
6570Other conversions or types that might be accepted by the C library
6571@code{scanf} cannot be used through @code{gmp_scanf}.
6572
6573Whitespace is read and discarded before a field, except for @samp{c} and
6574@samp{[} conversions.
6575
6576For float conversions, the decimal point character (or string) expected is
6577taken from the current locale settings on systems which provide
6578@code{localeconv} (@pxref{Locales,, Locales and Internationalization, libc,
6579The GNU C Library Reference Manual}). The C library will normally do the same
6580for standard float input.
6581
6582The format string is only interpreted as plain @code{char}s, multibyte
6583characters are not recognised. Perhaps this will change in the future.
6584
6585
6586@node Formatted Input Functions, C++ Formatted Input, Formatted Input Strings, Formatted Input
6587@section Formatted Input Functions
6588@cindex Input functions
6589
6590Each of the following functions is similar to the corresponding C library
6591function. The plain @code{scanf} forms take a variable argument list. The
6592@code{vscanf} forms take an argument pointer, see @ref{Variadic Functions,,
6593Variadic Functions, libc, The GNU C Library Reference Manual}, or @samp{man 3
6594va_start}.
6595
6596It should be emphasised that if a format string is invalid, or the arguments
6597don't match what the format specifies, then the behaviour of any of these
6598functions will be unpredictable. GCC format string checking is not available,
6599since it doesn't recognise the GMP extensions.
6600
6601No overlap is permitted between the @var{fmt} string and any of the results
6602produced.
6603
6604@deftypefun int gmp_scanf (const char *@var{fmt}, @dots{})
6605@deftypefunx int gmp_vscanf (const char *@var{fmt}, va_list @var{ap})
6606Read from the standard input @code{stdin}.
6607@end deftypefun
6608
6609@deftypefun int gmp_fscanf (FILE *@var{fp}, const char *@var{fmt}, @dots{})
6610@deftypefunx int gmp_vfscanf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})
6611Read from the stream @var{fp}.
6612@end deftypefun
6613
6614@deftypefun int gmp_sscanf (const char *@var{s}, const char *@var{fmt}, @dots{})
6615@deftypefunx int gmp_vsscanf (const char *@var{s}, const char *@var{fmt}, va_list @var{ap})
6616Read from a null-terminated string @var{s}.
6617@end deftypefun
6618
6619The return value from each of these functions is the same as the standard C99
6620@code{scanf}, namely the number of fields successfully parsed and stored.
6621@samp{%n} fields and fields read but suppressed by @samp{*} don't count
6622towards the return value.
6623
6624If end of input (or a file error) is reached before a character for a field or
6625a literal, and if no previous non-suppressed fields have matched, then the
6626return value is @code{EOF} instead of 0. A whitespace character in the format
6627string is only an optional match and doesn't induce an @code{EOF} in this
6628fashion. Leading whitespace read and discarded for a field don't count as
6629characters for that field.
6630
6631For the GMP types, input parsing follows C99 rules, namely one character of
6632lookahead is used and characters are read while they continue to meet the
6633format requirements. If this doesn't provide a complete number then the
6634function terminates, with that field not stored nor counted towards the return
6635value. For instance with @code{mpf_t} an input @samp{1.23e-XYZ} would be read
6636up to the @samp{X} and that character pushed back since it's not a digit. The
6637string @samp{1.23e-} would then be considered invalid since an @samp{e} must
6638be followed by at least one digit.
6639
6640For the standard C types, in the current implementation GMP calls the C
6641library @code{scanf} functions, which might have looser rules about what
6642constitutes a valid input.
6643
6644Note that @code{gmp_sscanf} is the same as @code{gmp_fscanf} and only does one
6645character of lookahead when parsing. Although clearly it could look at its
6646entire input, it is deliberately made identical to @code{gmp_fscanf}, the same
6647way C99 @code{sscanf} is the same as @code{fscanf}.
6648
6649
6650@node C++ Formatted Input, , Formatted Input Functions, Formatted Input
6651@section C++ Formatted Input
6652@cindex C++ @code{istream} input
6653@cindex @code{istream} input
6654
6655The following functions are provided in @file{libgmpxx} (@pxref{Headers and
6656Libraries}), which is built only if C++ support is enabled (@pxref{Build
6657Options}). Prototypes are available from @code{<gmp.h>}.
6658
6659@deftypefun istream& operator>> (istream& @var{stream}, mpz_t @var{rop})
6660Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
6661@end deftypefun
6662
6663@deftypefun istream& operator>> (istream& @var{stream}, mpq_t @var{rop})
6664An integer like @samp{123} will be read, or a fraction like @samp{5/9}. No
6665whitespace is allowed around the @samp{/}. If the fraction is not in
6666canonical form then @code{mpq_canonicalize} must be called (@pxref{Rational
6667Number Functions}) before operating on it.
6668
6669As per integer input, an @samp{0} or @samp{0x} base indicator is read when
6670none of @code{ios::dec}, @code{ios::oct} or @code{ios::hex} are set. This is
6671done separately for numerator and denominator, so that for instance
6672@samp{0x10/11} is @math{16/11} and @samp{0x10/0x11} is @math{16/17}.
6673@end deftypefun
6674
6675@deftypefun istream& operator>> (istream& @var{stream}, mpf_t @var{rop})
6676Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.
6677
6678Hex or octal floats are not supported, but might be in the future, or perhaps
6679it's best to accept only what the standard float @code{operator>>} does.
6680@end deftypefun
6681
6682Note that digit grouping specified by the @code{istream} locale is currently
6683not accepted. Perhaps this will change in the future.
6684
6685@sp 1
6686These operators mean that GMP types can be read in the usual C++ way, for
6687example,
6688
6689@example
6690mpz_t z;
6691...
6692cin >> z;
6693@end example
6694
6695But note that @code{istream} input (and @code{ostream} output, @pxref{C++
6696Formatted Output}) is the only overloading available for the GMP types and
6697that for instance using @code{+} with an @code{mpz_t} will have unpredictable
6698results. For classes with overloading, see @ref{C++ Class Interface}.
6699
6700
6701
6702@node C++ Class Interface, Custom Allocation, Formatted Input, Top
6703@chapter C++ Class Interface
6704@cindex C++ interface
6705
6706This chapter describes the C++ class based interface to GMP.
6707
6708All GMP C language types and functions can be used in C++ programs, since
6709@file{gmp.h} has @code{extern "C"} qualifiers, but the class interface offers
6710overloaded functions and operators which may be more convenient.
6711
6712Due to the implementation of this interface, a reasonably recent C++ compiler
6713is required, one supporting namespaces, partial specialization of templates
6714and member templates.
6715
6716@strong{Everything described in this chapter is to be considered preliminary
6717and might be subject to incompatible changes if some unforeseen difficulty
6718reveals itself.}
6719
6720@menu
6721* C++ Interface General::
6722* C++ Interface Integers::
6723* C++ Interface Rationals::
6724* C++ Interface Floats::
6725* C++ Interface Random Numbers::
6726* C++ Interface Limitations::
6727@end menu
6728
6729
6730@node C++ Interface General, C++ Interface Integers, C++ Class Interface, C++ Class Interface
6731@section C++ Interface General
6732
6733@noindent
6734All the C++ classes and functions are available with
6735
6736@cindex @code{gmpxx.h}
6737@example
6738#include <gmpxx.h>
6739@end example
6740
6741Programs should be linked with the @file{libgmpxx} and @file{libgmp}
6742libraries. For example,
6743
6744@example
6745g++ mycxxprog.cc -lgmpxx -lgmp
6746@end example
6747
6748@noindent
6749The classes defined are
6750
6751@deftp Class mpz_class
6752@deftpx Class mpq_class
6753@deftpx Class mpf_class
6754@end deftp
6755
6756The standard operators and various standard functions are overloaded to allow
6757arithmetic with these classes. For example,
6758
6759@example
6760int
6761main (void)
6762@{
6763 mpz_class a, b, c;
6764
6765 a = 1234;
6766 b = "-5678";
6767 c = a+b;
6768 cout << "sum is " << c << "\n";
6769 cout << "absolute value is " << abs(c) << "\n";
6770
6771 return 0;
6772@}
6773@end example
6774
6775An important feature of the implementation is that an expression like
6776@code{a=b+c} results in a single call to the corresponding @code{mpz_add},
6777without using a temporary for the @code{b+c} part. Expressions which by their
6778nature imply intermediate values, like @code{a=b*c+d*e}, still use temporaries
6779though.
6780
6781The classes can be freely intermixed in expressions, as can the classes and
6782the standard types @code{long}, @code{unsigned long} and @code{double}.
6783Smaller types like @code{int} or @code{float} can also be intermixed, since
6784C++ will promote them.
6785
6786Note that @code{bool} is not accepted directly, but must be explicitly cast to
6787an @code{int} first. This is because C++ will automatically convert any
6788pointer to a @code{bool}, so if GMP accepted @code{bool} it would make all
6789sorts of invalid class and pointer combinations compile but almost certainly
6790not do anything sensible.
6791
6792Conversions back from the classes to standard C++ types aren't done
6793automatically, instead member functions like @code{get_si} are provided (see
6794the following sections for details).
6795
6796Also there are no automatic conversions from the classes to the corresponding
6797GMP C types, instead a reference to the underlying C object can be obtained
6798with the following functions,
6799
6800@deftypefun mpz_t mpz_class::get_mpz_t ()
6801@deftypefunx mpq_t mpq_class::get_mpq_t ()
6802@deftypefunx mpf_t mpf_class::get_mpf_t ()
6803@end deftypefun
6804
6805These can be used to call a C function which doesn't have a C++ class
6806interface. For example to set @code{a} to the GCD of @code{b} and @code{c},
6807
6808@example
6809mpz_class a, b, c;
6810...
6811mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t());
6812@end example
6813
6814In the other direction, a class can be initialized from the corresponding GMP
6815C type, or assigned to if an explicit constructor is used. In both cases this
6816makes a copy of the value, it doesn't create any sort of association. For
6817example,
6818
6819@example
6820mpz_t z;
6821// ... init and calculate z ...
6822mpz_class x(z);
6823mpz_class y;
6824y = mpz_class (z);
6825@end example
6826
6827There are no namespace setups in @file{gmpxx.h}, all types and functions are
6828simply put into the global namespace. This is what @file{gmp.h} has done in
6829the past, and continues to do for compatibility. The extras provided by
6830@file{gmpxx.h} follow GMP naming conventions and are unlikely to clash with
6831anything.
6832
6833
6834@node C++ Interface Integers, C++ Interface Rationals, C++ Interface General, C++ Class Interface
6835@section C++ Interface Integers
6836
6837@deftypefun {} mpz_class::mpz_class (type @var{n})
6838Construct an @code{mpz_class}. All the standard C++ types may be used, except
6839@code{long long} and @code{long double}, and all the GMP C++ classes can be
6840used, although conversions from @code{mpq_class} and @code{mpf_class} are
6841@code{explicit}. Any necessary conversion follows the corresponding C
6842function, for example @code{double} follows @code{mpz_set_d}
6843(@pxref{Assigning Integers}).
6844@end deftypefun
6845
6846@deftypefun explicit mpz_class::mpz_class (const mpz_t @var{z})
6847Construct an @code{mpz_class} from an @code{mpz_t}. The value in @var{z} is
6848copied into the new @code{mpz_class}, there won't be any permanent association
6849between it and @var{z}.
6850@end deftypefun
6851
6852@deftypefun explicit mpz_class::mpz_class (const char *@var{s}, int @var{base} = 0)
6853@deftypefunx explicit mpz_class::mpz_class (const string& @var{s}, int @var{base} = 0)
6854Construct an @code{mpz_class} converted from a string using @code{mpz_set_str}
6855(@pxref{Assigning Integers}).
6856
6857If the string is not a valid integer, an @code{std::invalid_argument}
6858exception is thrown. The same applies to @code{operator=}.
6859@end deftypefun
6860
6861@deftypefun mpz_class operator"" _mpz (const char *@var{str})
6862With C++11 compilers, integers can be constructed with the syntax
6863@code{123_mpz} which is equivalent to @code{mpz_class("123")}.
6864@end deftypefun
6865
6866@deftypefun mpz_class operator/ (mpz_class @var{a}, mpz_class @var{d})
6867@deftypefunx mpz_class operator% (mpz_class @var{a}, mpz_class @var{d})
6868Divisions involving @code{mpz_class} round towards zero, as per the
6869@code{mpz_tdiv_q} and @code{mpz_tdiv_r} functions (@pxref{Integer Division}).
6870This is the same as the C99 @code{/} and @code{%} operators.
6871
6872The @code{mpz_fdiv@dots{}} or @code{mpz_cdiv@dots{}} functions can always be called
6873directly if desired. For example,
6874
6875@example
6876mpz_class q, a, d;
6877...
6878mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t());
6879@end example
6880@end deftypefun
6881
6882@deftypefun mpz_class abs (mpz_class @var{op})
6883@deftypefunx int cmp (mpz_class @var{op1}, type @var{op2})
6884@deftypefunx int cmp (type @var{op1}, mpz_class @var{op2})
6885@maybepagebreak
6886@deftypefunx bool mpz_class::fits_sint_p (void)
6887@deftypefunx bool mpz_class::fits_slong_p (void)
6888@deftypefunx bool mpz_class::fits_sshort_p (void)
6889@maybepagebreak
6890@deftypefunx bool mpz_class::fits_uint_p (void)
6891@deftypefunx bool mpz_class::fits_ulong_p (void)
6892@deftypefunx bool mpz_class::fits_ushort_p (void)
6893@maybepagebreak
6894@deftypefunx double mpz_class::get_d (void)
6895@deftypefunx long mpz_class::get_si (void)
6896@deftypefunx string mpz_class::get_str (int @var{base} = 10)
6897@deftypefunx {unsigned long} mpz_class::get_ui (void)
6898@maybepagebreak
6899@deftypefunx int mpz_class::set_str (const char *@var{str}, int @var{base})
6900@deftypefunx int mpz_class::set_str (const string& @var{str}, int @var{base})
6901@deftypefunx int sgn (mpz_class @var{op})
6902@deftypefunx mpz_class sqrt (mpz_class @var{op})
6903@maybepagebreak
6904@deftypefunx mpz_class gcd (mpz_class @var{op1}, mpz_class @var{op2})
6905@deftypefunx mpz_class lcm (mpz_class @var{op1}, mpz_class @var{op2})
6906@deftypefunx mpz_class mpz_class::factorial (type @var{op})
6907@deftypefunx mpz_class factorial (mpz_class @var{op})
6908@deftypefunx mpz_class mpz_class::primorial (type @var{op})
6909@deftypefunx mpz_class primorial (mpz_class @var{op})
6910@deftypefunx mpz_class mpz_class::fibonacci (type @var{op})
6911@deftypefunx mpz_class fibonacci (mpz_class @var{op})
6912@maybepagebreak
6913@deftypefunx void mpz_class::swap (mpz_class& @var{op})
6914@deftypefunx void swap (mpz_class& @var{op1}, mpz_class& @var{op2})
6915These functions provide a C++ class interface to the corresponding GMP C
6916routines. Calling @code{factorial} or @code{primorial} on a negative number
6917is undefined.
6918
6919@code{cmp} can be used with any of the classes or the standard C++ types,
6920except @code{long long} and @code{long double}.
6921@end deftypefun
6922
6923@sp 1
6924Overloaded operators for combinations of @code{mpz_class} and @code{double}
6925are provided for completeness, but it should be noted that if the given
6926@code{double} is not an integer then the way any rounding is done is currently
6927unspecified. The rounding might take place at the start, in the middle, or at
6928the end of the operation, and it might change in the future.
6929
6930Conversions between @code{mpz_class} and @code{double}, however, are defined
6931to follow the corresponding C functions @code{mpz_get_d} and @code{mpz_set_d}.
6932And comparisons are always made exactly, as per @code{mpz_cmp_d}.
6933
6934
6935@node C++ Interface Rationals, C++ Interface Floats, C++ Interface Integers, C++ Class Interface
6936@section C++ Interface Rationals
6937
6938In all the following constructors, if a fraction is given then it should be in
6939canonical form, or if not then @code{mpq_class::canonicalize} called.
6940
6941@deftypefun {} mpq_class::mpq_class (type @var{op})
6942@deftypefunx {} mpq_class::mpq_class (integer @var{num}, integer @var{den})
6943Construct an @code{mpq_class}. The initial value can be a single value of any
6944type (conversion from @code{mpf_class} is @code{explicit}), or a pair of
6945integers (@code{mpz_class} or standard C++ integer types) representing a
6946fraction, except that @code{long long} and @code{long double} are not
6947supported. For example,
6948
6949@example
6950mpq_class q (99);
6951mpq_class q (1.75);
6952mpq_class q (1, 3);
6953@end example
6954@end deftypefun
6955
6956@deftypefun explicit mpq_class::mpq_class (const mpq_t @var{q})
6957Construct an @code{mpq_class} from an @code{mpq_t}. The value in @var{q} is
6958copied into the new @code{mpq_class}, there won't be any permanent association
6959between it and @var{q}.
6960@end deftypefun
6961
6962@deftypefun explicit mpq_class::mpq_class (const char *@var{s}, int @var{base} = 0)
6963@deftypefunx explicit mpq_class::mpq_class (const string& @var{s}, int @var{base} = 0)
6964Construct an @code{mpq_class} converted from a string using @code{mpq_set_str}
6965(@pxref{Initializing Rationals}).
6966
6967If the string is not a valid rational, an @code{std::invalid_argument}
6968exception is thrown. The same applies to @code{operator=}.
6969@end deftypefun
6970
6971@deftypefun mpq_class operator"" _mpq (const char *@var{str})
6972With C++11 compilers, integral rationals can be constructed with the syntax
6973@code{123_mpq} which is equivalent to @code{mpq_class(123_mpz)}. Other
6974rationals can be built as @code{-1_mpq/2} or @code{0xb_mpq/123456_mpz}.
6975@end deftypefun
6976
6977@deftypefun void mpq_class::canonicalize ()
6978Put an @code{mpq_class} into canonical form, as per @ref{Rational Number
6979Functions}. All arithmetic operators require their operands in canonical
6980form, and will return results in canonical form.
6981@end deftypefun
6982
6983@deftypefun mpq_class abs (mpq_class @var{op})
6984@deftypefunx int cmp (mpq_class @var{op1}, type @var{op2})
6985@deftypefunx int cmp (type @var{op1}, mpq_class @var{op2})
6986@maybepagebreak
6987@deftypefunx double mpq_class::get_d (void)
6988@deftypefunx string mpq_class::get_str (int @var{base} = 10)
6989@maybepagebreak
6990@deftypefunx int mpq_class::set_str (const char *@var{str}, int @var{base})
6991@deftypefunx int mpq_class::set_str (const string& @var{str}, int @var{base})
6992@deftypefunx int sgn (mpq_class @var{op})
6993@maybepagebreak
6994@deftypefunx void mpq_class::swap (mpq_class& @var{op})
6995@deftypefunx void swap (mpq_class& @var{op1}, mpq_class& @var{op2})
6996These functions provide a C++ class interface to the corresponding GMP C
6997routines.
6998
6999@code{cmp} can be used with any of the classes or the standard C++ types,
7000except @code{long long} and @code{long double}.
7001@end deftypefun
7002
7003@deftypefun {mpz_class&} mpq_class::get_num ()
7004@deftypefunx {mpz_class&} mpq_class::get_den ()
7005Get a reference to an @code{mpz_class} which is the numerator or denominator
7006of an @code{mpq_class}. This can be used both for read and write access. If
7007the object returned is modified, it modifies the original @code{mpq_class}.
7008
7009If direct manipulation might produce a non-canonical value, then
7010@code{mpq_class::canonicalize} must be called before further operations.
7011@end deftypefun
7012
7013@deftypefun mpz_t mpq_class::get_num_mpz_t ()
7014@deftypefunx mpz_t mpq_class::get_den_mpz_t ()
7015Get a reference to the underlying @code{mpz_t} numerator or denominator of an
7016@code{mpq_class}. This can be passed to C functions expecting an
7017@code{mpz_t}. Any modifications made to the @code{mpz_t} will modify the
7018original @code{mpq_class}.
7019
7020If direct manipulation might produce a non-canonical value, then
7021@code{mpq_class::canonicalize} must be called before further operations.
7022@end deftypefun
7023
7024@deftypefun istream& operator>> (istream& @var{stream}, mpq_class& @var{rop});
7025Read @var{rop} from @var{stream}, using its @code{ios} formatting settings,
7026the same as @code{mpq_t operator>>} (@pxref{C++ Formatted Input}).
7027
7028If the @var{rop} read might not be in canonical form then
7029@code{mpq_class::canonicalize} must be called.
7030@end deftypefun
7031
7032
7033@node C++ Interface Floats, C++ Interface Random Numbers, C++ Interface Rationals, C++ Class Interface
7034@section C++ Interface Floats
7035
7036When an expression requires the use of temporary intermediate @code{mpf_class}
7037values, like @code{f=g*h+x*y}, those temporaries will have the same precision
7038as the destination @code{f}. Explicit constructors can be used if this
7039doesn't suit.
7040
7041@deftypefun {} mpf_class::mpf_class (type @var{op})
7042@deftypefunx {} mpf_class::mpf_class (type @var{op}, mp_bitcnt_t @var{prec})
7043Construct an @code{mpf_class}. Any standard C++ type can be used, except
7044@code{long long} and @code{long double}, and any of the GMP C++ classes can be
7045used.
7046
7047If @var{prec} is given, the initial precision is that value, in bits. If
7048@var{prec} is not given, then the initial precision is determined by the type
7049of @var{op} given. An @code{mpz_class}, @code{mpq_class}, or C++
7050builtin type will give the default @code{mpf} precision (@pxref{Initializing
7051Floats}). An @code{mpf_class} or expression will give the precision of that
7052value. The precision of a binary expression is the higher of the two
7053operands.
7054
7055@example
7056mpf_class f(1.5); // default precision
7057mpf_class f(1.5, 500); // 500 bits (at least)
7058mpf_class f(x); // precision of x
7059mpf_class f(abs(x)); // precision of x
7060mpf_class f(-g, 1000); // 1000 bits (at least)
7061mpf_class f(x+y); // greater of precisions of x and y
7062@end example
7063@end deftypefun
7064
7065@deftypefun explicit mpf_class::mpf_class (const mpf_t @var{f})
7066@deftypefunx {} mpf_class::mpf_class (const mpf_t @var{f}, mp_bitcnt_t @var{prec})
7067Construct an @code{mpf_class} from an @code{mpf_t}. The value in @var{f} is
7068copied into the new @code{mpf_class}, there won't be any permanent association
7069between it and @var{f}.
7070
7071If @var{prec} is given, the initial precision is that value, in bits. If
7072@var{prec} is not given, then the initial precision is that of @var{f}.
7073@end deftypefun
7074
7075@deftypefun explicit mpf_class::mpf_class (const char *@var{s})
7076@deftypefunx {} mpf_class::mpf_class (const char *@var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0)
7077@deftypefunx explicit mpf_class::mpf_class (const string& @var{s})
7078@deftypefunx {} mpf_class::mpf_class (const string& @var{s}, mp_bitcnt_t @var{prec}, int @var{base} = 0)
7079Construct an @code{mpf_class} converted from a string using @code{mpf_set_str}
7080(@pxref{Assigning Floats}). If @var{prec} is given, the initial precision is
7081that value, in bits. If not, the default @code{mpf} precision
7082(@pxref{Initializing Floats}) is used.
7083
7084If the string is not a valid float, an @code{std::invalid_argument} exception
7085is thrown. The same applies to @code{operator=}.
7086@end deftypefun
7087
7088@deftypefun mpf_class operator"" _mpf (const char *@var{str})
7089With C++11 compilers, floats can be constructed with the syntax
7090@code{1.23e-1_mpf} which is equivalent to @code{mpf_class("1.23e-1")}.
7091@end deftypefun
7092
7093@deftypefun {mpf_class&} mpf_class::operator= (type @var{op})
7094Convert and store the given @var{op} value to an @code{mpf_class} object. The
7095same types are accepted as for the constructors above.
7096
7097Note that @code{operator=} only stores a new value, it doesn't copy or change
7098the precision of the destination, instead the value is truncated if necessary.
7099This is the same as @code{mpf_set} etc. Note in particular this means for
7100@code{mpf_class} a copy constructor is not the same as a default constructor
7101plus assignment.
7102
7103@example
7104mpf_class x (y); // x created with precision of y
7105
7106mpf_class x; // x created with default precision
7107x = y; // value truncated to that precision
7108@end example
7109
7110Applications using templated code may need to be careful about the assumptions
7111the code makes in this area, when working with @code{mpf_class} values of
7112various different or non-default precisions. For instance implementations of
7113the standard @code{complex} template have been seen in both styles above,
7114though of course @code{complex} is normally only actually specified for use
7115with the builtin float types.
7116@end deftypefun
7117
7118@deftypefun mpf_class abs (mpf_class @var{op})
7119@deftypefunx mpf_class ceil (mpf_class @var{op})
7120@deftypefunx int cmp (mpf_class @var{op1}, type @var{op2})
7121@deftypefunx int cmp (type @var{op1}, mpf_class @var{op2})
7122@maybepagebreak
7123@deftypefunx bool mpf_class::fits_sint_p (void)
7124@deftypefunx bool mpf_class::fits_slong_p (void)
7125@deftypefunx bool mpf_class::fits_sshort_p (void)
7126@maybepagebreak
7127@deftypefunx bool mpf_class::fits_uint_p (void)
7128@deftypefunx bool mpf_class::fits_ulong_p (void)
7129@deftypefunx bool mpf_class::fits_ushort_p (void)
7130@maybepagebreak
7131@deftypefunx mpf_class floor (mpf_class @var{op})
7132@deftypefunx mpf_class hypot (mpf_class @var{op1}, mpf_class @var{op2})
7133@maybepagebreak
7134@deftypefunx double mpf_class::get_d (void)
7135@deftypefunx long mpf_class::get_si (void)
7136@deftypefunx string mpf_class::get_str (mp_exp_t& @var{exp}, int @var{base} = 10, size_t @var{digits} = 0)
7137@deftypefunx {unsigned long} mpf_class::get_ui (void)
7138@maybepagebreak
7139@deftypefunx int mpf_class::set_str (const char *@var{str}, int @var{base})
7140@deftypefunx int mpf_class::set_str (const string& @var{str}, int @var{base})
7141@deftypefunx int sgn (mpf_class @var{op})
7142@deftypefunx mpf_class sqrt (mpf_class @var{op})
7143@maybepagebreak
7144@deftypefunx void mpf_class::swap (mpf_class& @var{op})
7145@deftypefunx void swap (mpf_class& @var{op1}, mpf_class& @var{op2})
7146@deftypefunx mpf_class trunc (mpf_class @var{op})
7147These functions provide a C++ class interface to the corresponding GMP C
7148routines.
7149
7150@code{cmp} can be used with any of the classes or the standard C++ types,
7151except @code{long long} and @code{long double}.
7152
7153The accuracy provided by @code{hypot} is not currently guaranteed.
7154@end deftypefun
7155
7156@deftypefun {mp_bitcnt_t} mpf_class::get_prec ()
7157@deftypefunx void mpf_class::set_prec (mp_bitcnt_t @var{prec})
7158@deftypefunx void mpf_class::set_prec_raw (mp_bitcnt_t @var{prec})
7159Get or set the current precision of an @code{mpf_class}.
7160
7161The restrictions described for @code{mpf_set_prec_raw} (@pxref{Initializing
7162Floats}) apply to @code{mpf_class::set_prec_raw}. Note in particular that the
7163@code{mpf_class} must be restored to it's allocated precision before being
7164destroyed. This must be done by application code, there's no automatic
7165mechanism for it.
7166@end deftypefun
7167
7168
7169@node C++ Interface Random Numbers, C++ Interface Limitations, C++ Interface Floats, C++ Class Interface
7170@section C++ Interface Random Numbers
7171
7172@deftp Class gmp_randclass
7173The C++ class interface to the GMP random number functions uses
7174@code{gmp_randclass} to hold an algorithm selection and current state, as per
7175@code{gmp_randstate_t}.
7176@end deftp
7177
7178@deftypefun {} gmp_randclass::gmp_randclass (void (*@var{randinit}) (gmp_randstate_t, @dots{}), @dots{})
7179Construct a @code{gmp_randclass}, using a call to the given @var{randinit}
7180function (@pxref{Random State Initialization}). The arguments expected are
7181the same as @var{randinit}, but with @code{mpz_class} instead of @code{mpz_t}.
7182For example,
7183
7184@example
7185gmp_randclass r1 (gmp_randinit_default);
7186gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32);
7187gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp);
7188gmp_randclass r4 (gmp_randinit_mt);
7189@end example
7190
7191@code{gmp_randinit_lc_2exp_size} will fail if the size requested is too big,
7192an @code{std::length_error} exception is thrown in that case.
7193@end deftypefun
7194
7195@deftypefun {} gmp_randclass::gmp_randclass (gmp_randalg_t @var{alg}, @dots{})
7196Construct a @code{gmp_randclass} using the same parameters as
7197@code{gmp_randinit} (@pxref{Random State Initialization}). This function is
7198obsolete and the above @var{randinit} style should be preferred.
7199@end deftypefun
7200
7201@deftypefun void gmp_randclass::seed (unsigned long int @var{s})
7202@deftypefunx void gmp_randclass::seed (mpz_class @var{s})
7203Seed a random number generator. See @pxref{Random Number Functions}, for how
7204to choose a good seed.
7205@end deftypefun
7206
7207@deftypefun mpz_class gmp_randclass::get_z_bits (mp_bitcnt_t @var{bits})
7208@deftypefunx mpz_class gmp_randclass::get_z_bits (mpz_class @var{bits})
7209Generate a random integer with a specified number of bits.
7210@end deftypefun
7211
7212@deftypefun mpz_class gmp_randclass::get_z_range (mpz_class @var{n})
7213Generate a random integer in the range 0 to @math{@var{n}-1} inclusive.
7214@end deftypefun
7215
7216@deftypefun mpf_class gmp_randclass::get_f ()
7217@deftypefunx mpf_class gmp_randclass::get_f (mp_bitcnt_t @var{prec})
7218Generate a random float @var{f} in the range @math{0 <= @var{f} < 1}. @var{f}
7219will be to @var{prec} bits precision, or if @var{prec} is not given then to
7220the precision of the destination. For example,
7221
7222@example
7223gmp_randclass r;
7224...
7225mpf_class f (0, 512); // 512 bits precision
7226f = r.get_f(); // random number, 512 bits
7227@end example
7228@end deftypefun
7229
7230
7231
7232@node C++ Interface Limitations, , C++ Interface Random Numbers, C++ Class Interface
7233@section C++ Interface Limitations
7234
7235@table @asis
7236@item @code{mpq_class} and Templated Reading
7237A generic piece of template code probably won't know that @code{mpq_class}
7238requires a @code{canonicalize} call if inputs read with @code{operator>>}
7239might be non-canonical. This can lead to incorrect results.
7240
7241@code{operator>>} behaves as it does for reasons of efficiency. A
7242canonicalize can be quite time consuming on large operands, and is best
7243avoided if it's not necessary.
7244
7245But this potential difficulty reduces the usefulness of @code{mpq_class}.
7246Perhaps a mechanism to tell @code{operator>>} what to do will be adopted in
7247the future, maybe a preprocessor define, a global flag, or an @code{ios} flag
7248pressed into service. Or maybe, at the risk of inconsistency, the
7249@code{mpq_class} @code{operator>>} could canonicalize and leave @code{mpq_t}
7250@code{operator>>} not doing so, for use on those occasions when that's
7251acceptable. Send feedback or alternate ideas to @email{gmp-bugs@@gmplib.org}.
7252
7253@item Subclassing
7254Subclassing the GMP C++ classes works, but is not currently recommended.
7255
7256Expressions involving subclasses resolve correctly (or seem to), but in normal
7257C++ fashion the subclass doesn't inherit constructors and assignments.
7258There's many of those in the GMP classes, and a good way to reestablish them
7259in a subclass is not yet provided.
7260
7261@item Templated Expressions
7262A subtle difficulty exists when using expressions together with
7263application-defined template functions. Consider the following, with @code{T}
7264intended to be some numeric type,
7265
7266@example
7267template <class T>
7268T fun (const T &, const T &);
7269@end example
7270
7271@noindent
7272When used with, say, plain @code{mpz_class} variables, it works fine: @code{T}
7273is resolved as @code{mpz_class}.
7274
7275@example
7276mpz_class f(1), g(2);
7277fun (f, g); // Good
7278@end example
7279
7280@noindent
7281But when one of the arguments is an expression, it doesn't work.
7282
7283@example
7284mpz_class f(1), g(2), h(3);
7285fun (f, g+h); // Bad
7286@end example
7287
7288This is because @code{g+h} ends up being a certain expression template type
7289internal to @code{gmpxx.h}, which the C++ template resolution rules are unable
7290to automatically convert to @code{mpz_class}. The workaround is simply to add
7291an explicit cast.
7292
7293@example
7294mpz_class f(1), g(2), h(3);
7295fun (f, mpz_class(g+h)); // Good
7296@end example
7297
7298Similarly, within @code{fun} it may be necessary to cast an expression to type
7299@code{T} when calling a templated @code{fun2}.
7300
7301@example
7302template <class T>
7303void fun (T f, T g)
7304@{
7305 fun2 (f, f+g); // Bad
7306@}
7307
7308template <class T>
7309void fun (T f, T g)
7310@{
7311 fun2 (f, T(f+g)); // Good
7312@}
7313@end example
7314
7315@item C++11
7316C++11 provides several new ways in which types can be inferred: @code{auto},
7317@code{decltype}, etc. While they can be very convenient, they don't mix well
7318with expression templates. In this example, the addition is performed twice,
7319as if we had defined @code{sum} as a macro.
7320
7321@example
7322mpz_class z = 33;
7323auto sum = z + z;
7324mpz_class prod = sum * sum;
7325@end example
7326
7327This other example may crash, though some compilers might make it look like
7328it is working, because the expression @code{z+z} goes out of scope before it
7329is evaluated.
7330
7331@example
7332mpz_class z = 33;
7333auto sum = z + z + z;
7334mpz_class prod = sum * 2;
7335@end example
7336
7337It is thus strongly recommended to avoid @code{auto} anywhere a GMP C++
7338expression may appear.
7339@end table
7340
7341
7342@node Custom Allocation, Language Bindings, C++ Class Interface, Top
7343@comment node-name, next, previous, up
7344@chapter Custom Allocation
7345@cindex Custom allocation
7346@cindex Memory allocation
7347@cindex Allocation of memory
7348
7349By default GMP uses @code{malloc}, @code{realloc} and @code{free} for memory
7350allocation, and if they fail GMP prints a message to the standard error output
7351and terminates the program.
7352
7353Alternate functions can be specified, to allocate memory in a different way or
7354to have a different error action on running out of memory.
7355
7356@deftypefun void mp_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (*@var{free_func_ptr}) (void *, size_t))
7357Replace the current allocation functions from the arguments. If an argument
7358is @code{NULL}, the corresponding default function is used.
7359
7360These functions will be used for all memory allocation done by GMP, apart from
7361temporary space from @code{alloca} if that function is available and GMP is
7362configured to use it (@pxref{Build Options}).
7363
7364@strong{Be sure to call @code{mp_set_memory_functions} only when there are no
7365active GMP objects allocated using the previous memory functions! Usually
7366that means calling it before any other GMP function.}
7367@end deftypefun
7368
7369The functions supplied should fit the following declarations:
7370
7371@deftypevr Function {void *} allocate_function (size_t @var{alloc_size})
7372Return a pointer to newly allocated space with at least @var{alloc_size}
7373bytes.
7374@end deftypevr
7375
7376@deftypevr Function {void *} reallocate_function (void *@var{ptr}, size_t @var{old_size}, size_t @var{new_size})
7377Resize a previously allocated block @var{ptr} of @var{old_size} bytes to be
7378@var{new_size} bytes.
7379
7380The block may be moved if necessary or if desired, and in that case the
7381smaller of @var{old_size} and @var{new_size} bytes must be copied to the new
7382location. The return value is a pointer to the resized block, that being the
7383new location if moved or just @var{ptr} if not.
7384
7385@var{ptr} is never @code{NULL}, it's always a previously allocated block.
7386@var{new_size} may be bigger or smaller than @var{old_size}.
7387@end deftypevr
7388
7389@deftypevr Function void free_function (void *@var{ptr}, size_t @var{size})
7390De-allocate the space pointed to by @var{ptr}.
7391
7392@var{ptr} is never @code{NULL}, it's always a previously allocated block of
7393@var{size} bytes.
7394@end deftypevr
7395
7396A @dfn{byte} here means the unit used by the @code{sizeof} operator.
7397
7398The @var{reallocate_function} parameter @var{old_size} and the
7399@var{free_function} parameter @var{size} are passed for convenience, but of
7400course they can be ignored if not needed by an implementation. The default
7401functions using @code{malloc} and friends for instance don't use them.
7402
7403No error return is allowed from any of these functions, if they return then
7404they must have performed the specified operation. In particular note that
7405@var{allocate_function} or @var{reallocate_function} mustn't return
7406@code{NULL}.
7407
7408Getting a different fatal error action is a good use for custom allocation
7409functions, for example giving a graphical dialog rather than the default print
7410to @code{stderr}. How much is possible when genuinely out of memory is
7411another question though.
7412
7413There's currently no defined way for the allocation functions to recover from
7414an error such as out of memory, they must terminate program execution. A
7415@code{longjmp} or throwing a C++ exception will have undefined results. This
7416may change in the future.
7417
7418GMP may use allocated blocks to hold pointers to other allocated blocks. This
7419will limit the assumptions a conservative garbage collection scheme can make.
7420
7421Since the default GMP allocation uses @code{malloc} and friends, those
7422functions will be linked in even if the first thing a program does is an
7423@code{mp_set_memory_functions}. It's necessary to change the GMP sources if
7424this is a problem.
7425
7426@sp 1
7427@deftypefun void mp_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (**@var{free_func_ptr}) (void *, size_t))
7428Get the current allocation functions, storing function pointers to the
7429locations given by the arguments. If an argument is @code{NULL}, that
7430function pointer is not stored.
7431
7432@need 1000
7433For example, to get just the current free function,
7434
7435@example
7436void (*freefunc) (void *, size_t);
7437
7438mp_get_memory_functions (NULL, NULL, &freefunc);
7439@end example
7440@end deftypefun
7441
7442@node Language Bindings, Algorithms, Custom Allocation, Top
7443@chapter Language Bindings
7444@cindex Language bindings
7445@cindex Other languages
7446
7447The following packages and projects offer access to GMP from languages other
7448than C, though perhaps with varying levels of functionality and efficiency.
7449
7450@c @spaceuref{U} is the same as @uref{U}, but with a couple of extra spaces
7451@c in tex, just to separate the URL from the preceding text a bit.
7452@iftex
7453@macro spaceuref {U}
7454@ @ @uref{\U\}
7455@end macro
7456@end iftex
7457@ifnottex
7458@macro spaceuref {U}
7459@uref{\U\}
7460@end macro
7461@end ifnottex
7462
7463@sp 1
7464@table @asis
7465@item C++
7466@itemize @bullet
7467@item
7468GMP C++ class interface, @pxref{C++ Class Interface} @* Straightforward
7469interface, expression templates to eliminate temporaries.
7470@item
7471ALP @spaceuref{https://www-sop.inria.fr/saga/logiciels/ALP/} @* Linear algebra and
7472polynomials using templates.
7473@item
7474CLN @spaceuref{https://www.ginac.de/CLN/} @* High level classes for arithmetic.
7475@item
7476Linbox @spaceuref{http://www.linalg.org/} @* Sparse vectors and matrices.
7477@item
7478NTL @spaceuref{http://www.shoup.net/ntl/} @* A C++ number theory library.
7479@end itemize
7480
7481@c @item D
7482@c @itemize @bullet
7483@c @item
7484@c gmp-d @spaceuref{http://home.comcast.net/~benhinkle/gmp-d/}
7485@c @end itemize
7486
7487@item Eiffel
7488@itemize @bullet
7489@item
7490Eiffelroom @spaceuref{http://www.eiffelroom.org/node/442}
7491@end itemize
7492
7493@c @item Fortran
7494@c @itemize @bullet
7495@c @item
7496@c Omni F77 @spaceuref{http://phase.hpcc.jp/Omni/home.html} @* Arbitrary
7497@c precision floats.
7498@c @end itemize
7499
7500@item Haskell
7501@itemize @bullet
7502@item
7503Glasgow Haskell Compiler @spaceuref{https://www.haskell.org/ghc/}
7504@end itemize
7505
7506@item Java
7507@itemize @bullet
7508@item
7509Kaffe @spaceuref{https://github.com/kaffe/kaffe}
7510@end itemize
7511
7512@item Lisp
7513@itemize @bullet
7514@item
7515GNU Common Lisp @spaceuref{https://www.gnu.org/software/gcl/gcl.html}
7516@item
7517Librep @spaceuref{http://librep.sourceforge.net/}
7518@item
7519@c FIXME: When there's a stable release with gmp support, just refer to it
7520@c rather than bothering to talk about betas.
7521XEmacs (21.5.18 beta and up) @spaceuref{https://www.xemacs.org} @* Optional
7522big integers, rationals and floats using GMP.
7523@end itemize
7524
7525@item ML
7526@itemize @bullet
7527@item
7528MLton compiler @spaceuref{http://mlton.org/}
7529@end itemize
7530
7531@item Objective Caml
7532@itemize @bullet
7533@item
7534MLGMP @spaceuref{https://opam.ocaml.org/packages/mlgmp/}
7535@item
7536Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* Optionally using
7537GMP.
7538@end itemize
7539
7540@item Oz
7541@itemize @bullet
7542@item
7543Mozart @spaceuref{https://mozart.github.io/}
7544@end itemize
7545
7546@item Pascal
7547@itemize @bullet
7548@item
7549GNU Pascal Compiler @spaceuref{http://www.gnu-pascal.de/} @* GMP unit.
7550@item
7551Numerix @spaceuref{http://pauillac.inria.fr/~quercia/} @* For Free Pascal,
7552optionally using GMP.
7553@end itemize
7554
7555@item Perl
7556@itemize @bullet
7557@item
7558GMP module, see @file{demos/perl} in the GMP sources (@pxref{Demonstration
7559Programs}).
7560@item
7561Math::GMP @spaceuref{https://www.cpan.org/} @* Compatible with Math::BigInt, but
7562not as many functions as the GMP module above.
7563@item
7564Math::BigInt::GMP @spaceuref{https://www.cpan.org/} @* Plug Math::GMP into
7565normal Math::BigInt operations.
7566@end itemize
7567
7568@need 1000
7569@item Pike
7570@itemize @bullet
7571@item
7572pikempz module in the standard distribution, @uref{https://pike.lysator.liu.se/}
7573@end itemize
7574
7575@need 500
7576@item Prolog
7577@itemize @bullet
7578@item
7579SWI Prolog @spaceuref{http://www.swi-prolog.org/} @*
7580Arbitrary precision floats.
7581@end itemize
7582
7583@item Python
7584@itemize @bullet
7585@item
7586GMPY @uref{https://code.google.com/p/gmpy/}
7587@end itemize
7588
7589@item Ruby
7590@itemize @bullet
7591@item
7592@uref{https://rubygems.org/gems/gmp}
7593@end itemize
7594
7595@item Scheme
7596@itemize @bullet
7597@item
7598GNU Guile @spaceuref{https://www.gnu.org/software/guile/guile.html}
7599@item
7600RScheme @spaceuref{https://www.rscheme.org/}
7601@item
7602STklos @spaceuref{http://www.stklos.net/}
7603@c
7604@c For reference, MzScheme uses some of gmp, but (as of version 205) it only
7605@c has copies of some of the generic C code, and we don't consider that a
7606@c language binding to gmp.
7607@c
7608@end itemize
7609
7610@item Smalltalk
7611@itemize @bullet
7612@item
7613GNU Smalltalk @spaceuref{http://smalltalk.gnu.org/}
7614@end itemize
7615
7616@item Other
7617@itemize @bullet
7618@item
7619Axiom @uref{https://savannah.nongnu.org/projects/axiom} @* Computer algebra
7620using GCL.
7621@item
7622DrGenius @spaceuref{http://drgenius.seul.org/} @* Geometry system and
7623mathematical programming language.
7624@item
7625GiNaC @spaceuref{httsp://www.ginac.de/} @* C++ computer algebra using CLN.
7626@item
7627GOO @spaceuref{https://www.eecs.berkeley.edu/~jrb/goo/} @* Dynamic object oriented
7628language.
7629@item
7630Maxima @uref{https://www.ma.utexas.edu/users/wfs/maxima.html} @* Macsyma
7631computer algebra using GCL.
7632@c @item
7633@c Q @spaceuref{http://q-lang.sourceforge.net/} @* Equational programming system.
7634@item
7635Regina @spaceuref{http://regina.sourceforge.net/} @* Topological calculator.
7636@item
7637Yacas @spaceuref{http://yacas.sourceforge.net} @* Yet another computer algebra system.
7638@end itemize
7639
7640@end table
7641
7642
7643@node Algorithms, Internals, Language Bindings, Top
7644@chapter Algorithms
7645@cindex Algorithms
7646
7647This chapter is an introduction to some of the algorithms used for various GMP
7648operations. The code is likely to be hard to understand without knowing
7649something about the algorithms.
7650
7651Some GMP internals are mentioned, but applications that expect to be
7652compatible with future GMP releases should take care to use only the
7653documented functions.
7654
7655@menu
7656* Multiplication Algorithms::
7657* Division Algorithms::
7658* Greatest Common Divisor Algorithms::
7659* Powering Algorithms::
7660* Root Extraction Algorithms::
7661* Radix Conversion Algorithms::
7662* Other Algorithms::
7663* Assembly Coding::
7664@end menu
7665
7666
7667@node Multiplication Algorithms, Division Algorithms, Algorithms, Algorithms
7668@section Multiplication
7669@cindex Multiplication algorithms
7670
7671N@cross{}N limb multiplications and squares are done using one of seven
7672algorithms, as the size N increases.
7673
7674@quotation
7675@multitable {KaratsubaMMM} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
7676@item Algorithm @tab Threshold
7677@item Basecase @tab (none)
7678@item Karatsuba @tab @code{MUL_TOOM22_THRESHOLD}
7679@item Toom-3 @tab @code{MUL_TOOM33_THRESHOLD}
7680@item Toom-4 @tab @code{MUL_TOOM44_THRESHOLD}
7681@item Toom-6.5 @tab @code{MUL_TOOM6H_THRESHOLD}
7682@item Toom-8.5 @tab @code{MUL_TOOM8H_THRESHOLD}
7683@item FFT @tab @code{MUL_FFT_THRESHOLD}
7684@end multitable
7685@end quotation
7686
7687Similarly for squaring, with the @code{SQR} thresholds.
7688
7689N@cross{}M multiplications of operands with different sizes above
7690@code{MUL_TOOM22_THRESHOLD} are currently done by special Toom-inspired
7691algorithms or directly with FFT, depending on operand size (@pxref{Unbalanced
7692Multiplication}).
7693
7694@menu
7695* Basecase Multiplication::
7696* Karatsuba Multiplication::
7697* Toom 3-Way Multiplication::
7698* Toom 4-Way Multiplication::
7699* Higher degree Toom'n'half::
7700* FFT Multiplication::
7701* Other Multiplication::
7702* Unbalanced Multiplication::
7703@end menu
7704
7705
7706@node Basecase Multiplication, Karatsuba Multiplication, Multiplication Algorithms, Multiplication Algorithms
7707@subsection Basecase Multiplication
7708
7709Basecase N@cross{}M multiplication is a straightforward rectangular set of
7710cross-products, the same as long multiplication done by hand and for that
7711reason sometimes known as the schoolbook or grammar school method. This is an
7712@m{O(NM),O(N*M)} algorithm. See Knuth section 4.3.1 algorithm M
7713(@pxref{References}), and the @file{mpn/generic/mul_basecase.c} code.
7714
7715Assembly implementations of @code{mpn_mul_basecase} are essentially the same
7716as the generic C code, but have all the usual assembly tricks and
7717obscurities introduced for speed.
7718
7719A square can be done in roughly half the time of a multiply, by using the fact
7720that the cross products above and below the diagonal are the same. A triangle
7721of products below the diagonal is formed, doubled (left shift by one bit), and
7722then the products on the diagonal added. This can be seen in
7723@file{mpn/generic/sqr_basecase.c}. Again the assembly implementations take
7724essentially the same approach.
7725
7726@tex
7727\def\GMPline#1#2#3#4#5#6{%
7728 \hbox {%
7729 \vrule height 2.5ex depth 1ex
7730 \hbox to 2em {\hfil{#2}\hfil}%
7731 \vrule \hbox to 2em {\hfil{#3}\hfil}%
7732 \vrule \hbox to 2em {\hfil{#4}\hfil}%
7733 \vrule \hbox to 2em {\hfil{#5}\hfil}%
7734 \vrule \hbox to 2em {\hfil{#6}\hfil}%
7735 \vrule}}
7736\GMPdisplay{
7737 \hbox{%
7738 \vbox{%
7739 \hbox to 1.5em {\vrule height 2.5ex depth 1ex width 0pt}%
7740 \hbox {\vrule height 2.5ex depth 1ex width 0pt u0\hfil}%
7741 \hbox {\vrule height 2.5ex depth 1ex width 0pt u1\hfil}%
7742 \hbox {\vrule height 2.5ex depth 1ex width 0pt u2\hfil}%
7743 \hbox {\vrule height 2.5ex depth 1ex width 0pt u3\hfil}%
7744 \hbox {\vrule height 2.5ex depth 1ex width 0pt u4\hfil}%
7745 \vfill}%
7746 \vbox{%
7747 \hbox{%
7748 \hbox to 2em {\hfil u0\hfil}%
7749 \hbox to 2em {\hfil u1\hfil}%
7750 \hbox to 2em {\hfil u2\hfil}%
7751 \hbox to 2em {\hfil u3\hfil}%
7752 \hbox to 2em {\hfil u4\hfil}}%
7753 \vskip 0.7ex
7754 \hrule
7755 \GMPline{u0}{d}{}{}{}{}%
7756 \hrule
7757 \GMPline{u1}{}{d}{}{}{}%
7758 \hrule
7759 \GMPline{u2}{}{}{d}{}{}%
7760 \hrule
7761 \GMPline{u3}{}{}{}{d}{}%
7762 \hrule
7763 \GMPline{u4}{}{}{}{}{d}%
7764 \hrule}}}
7765@end tex
7766@ifnottex
7767@example
7768@group
7769 u0 u1 u2 u3 u4
7770 +---+---+---+---+---+
7771u0 | d | | | | |
7772 +---+---+---+---+---+
7773u1 | | d | | | |
7774 +---+---+---+---+---+
7775u2 | | | d | | |
7776 +---+---+---+---+---+
7777u3 | | | | d | |
7778 +---+---+---+---+---+
7779u4 | | | | | d |
7780 +---+---+---+---+---+
7781@end group
7782@end example
7783@end ifnottex
7784
7785In practice squaring isn't a full 2@cross{} faster than multiplying, it's
7786usually around 1.5@cross{}. Less than 1.5@cross{} probably indicates
7787@code{mpn_sqr_basecase} wants improving on that CPU.
7788
7789On some CPUs @code{mpn_mul_basecase} can be faster than the generic C
7790@code{mpn_sqr_basecase} on some small sizes. @code{SQR_BASECASE_THRESHOLD} is
7791the size at which to use @code{mpn_sqr_basecase}, this will be zero if that
7792routine should be used always.
7793
7794
7795@node Karatsuba Multiplication, Toom 3-Way Multiplication, Basecase Multiplication, Multiplication Algorithms
7796@subsection Karatsuba Multiplication
7797@cindex Karatsuba multiplication
7798
7799The Karatsuba multiplication algorithm is described in Knuth section 4.3.3
7800part A, and various other textbooks. A brief description is given here.
7801
7802The inputs @math{x} and @math{y} are treated as each split into two parts of
7803equal length (or the most significant part one limb shorter if N is odd).
7804
7805@tex
7806% GMPboxwidth used for all the multiplication pictures
7807\global\newdimen\GMPboxwidth \global\GMPboxwidth=5em
7808% GMPboxdepth and GMPboxheight are also used for the float pictures
7809\global\newdimen\GMPboxdepth \global\GMPboxdepth=1ex
7810\global\newdimen\GMPboxheight \global\GMPboxheight=2ex
7811\gdef\GMPvrule{\vrule height \GMPboxheight depth \GMPboxdepth}
7812\def\GMPbox#1#2{%
7813 \vbox {%
7814 \hrule
7815 \hbox to 2\GMPboxwidth{%
7816 \GMPvrule \hfil $#1$\hfil \vrule \hfil $#2$\hfil \vrule}%
7817 \hrule}}
7818\GMPdisplay{%
7819\vbox{%
7820 \hbox to 2\GMPboxwidth {high \hfil low}
7821 \vskip 0.7ex
7822 \GMPbox{x_1}{x_0}
7823 \vskip 0.5ex
7824 \GMPbox{y_1}{y_0}
7825}}
7826@end tex
7827@ifnottex
7828@example
7829@group
7830 high low
7831+----------+----------+
7832| x1 | x0 |
7833+----------+----------+
7834
7835+----------+----------+
7836| y1 | y0 |
7837+----------+----------+
7838@end group
7839@end example
7840@end ifnottex
7841
7842Let @math{b} be the power of 2 where the split occurs, i.e.@: if @ms{x,0} is
7843@math{k} limbs (@ms{y,0} the same) then
7844@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
7845With that @m{x=x_1b+x_0,x=x1*b+x0} and @m{y=y_1b+y_0,y=y1*b+y0}, and the
7846following holds,
7847
7848@display
7849@m{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0,
7850 x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0}
7851@end display
7852
7853This formula means doing only three multiplies of (N/2)@cross{}(N/2) limbs,
7854whereas a basecase multiply of N@cross{}N limbs is equivalent to four
7855multiplies of (N/2)@cross{}(N/2). The factors @math{(b^2+b)} etc represent
7856the positions where the three products must be added.
7857
7858@tex
7859\def\GMPboxA#1#2{%
7860 \vbox{%
7861 \hrule
7862 \hbox{%
7863 \GMPvrule
7864 \hbox to 2\GMPboxwidth {\hfil\hbox{$#1$}\hfil}%
7865 \vrule
7866 \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
7867 \vrule}
7868 \hrule}}
7869\def\GMPboxB#1#2{%
7870 \hbox{%
7871 \raise \GMPboxdepth \hbox to \GMPboxwidth {\hfil #1\hskip 0.5em}%
7872 \vbox{%
7873 \hrule
7874 \hbox{%
7875 \GMPvrule
7876 \hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%
7877 \vrule}%
7878 \hrule}}}
7879\GMPdisplay{%
7880\vbox{%
7881 \hbox to 4\GMPboxwidth {high \hfil low}
7882 \vskip 0.7ex
7883 \GMPboxA{x_1y_1}{x_0y_0}
7884 \vskip 0.5ex
7885 \GMPboxB{$+$}{x_1y_1}
7886 \vskip 0.5ex
7887 \GMPboxB{$+$}{x_0y_0}
7888 \vskip 0.5ex
7889 \GMPboxB{$-$}{(x_1-x_0)(y_1-y_0)}
7890}}
7891@end tex
7892@ifnottex
7893@example
7894@group
7895 high low
7896+--------+--------+ +--------+--------+
7897| x1*y1 | | x0*y0 |
7898+--------+--------+ +--------+--------+
7899 +--------+--------+
7900 add | x1*y1 |
7901 +--------+--------+
7902 +--------+--------+
7903 add | x0*y0 |
7904 +--------+--------+
7905 +--------+--------+
7906 sub | (x1-x0)*(y1-y0) |
7907 +--------+--------+
7908@end group
7909@end example
7910@end ifnottex
7911
7912The term @m{(x_1-x_0)(y_1-y_0),(x1-x0)*(y1-y0)} is best calculated as an
7913absolute value, and the sign used to choose to add or subtract. Notice the
7914sum @m{\mathop{\rm high}(x_0y_0)+\mathop{\rm low}(x_1y_1),
7915high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do @m{5k,5*k} limb
7916additions, rather than @m{6k,6*k}, but in GMP extra function call overheads
7917outweigh the saving.
7918
7919Squaring is similar to multiplying, but with @math{x=y} the formula reduces to
7920an equivalent with three squares,
7921
7922@display
7923@m{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2,
7924 x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2}
7925@end display
7926
7927The final result is accumulated from those three squares the same way as for
7928the three multiplies above. The middle term @m{(x_1-x_0)^2,(x1-x0)^2} is now
7929always positive.
7930
7931A similar formula for both multiplying and squaring can be constructed with a
7932middle term @m{(x_1+x_0)(y_1+y_0),(x1+x0)*(y1+y0)}. But those sums can exceed
7933@math{k} limbs, leading to more carry handling and additions than the form
7934above.
7935
7936Karatsuba multiplication is asymptotically an @math{O(N^@W{1.585})} algorithm,
7937the exponent being @m{\log3/\log2,log(3)/log(2)}, representing 3 multiplies
7938each @math{1/2} the size of the inputs. This is a big improvement over the
7939basecase multiply at @math{O(N^2)} and the advantage soon overcomes the extra
7940additions Karatsuba performs. @code{MUL_TOOM22_THRESHOLD} can be as little
7941as 10 limbs. The @code{SQR} threshold is usually about twice the @code{MUL}.
7942
7943The basecase algorithm will take a time of the form @m{M(N) = aN^2 + bN + c,
7944M(N) = a*N^2 + b*N + c} and the Karatsuba algorithm @m{K(N) = 3M(N/2) + dN +
7945e, K(N) = 3*M(N/2) + d*N + e}, which expands to @m{K(N) = {3\over4} aN^2 +
7946{3\over2} bN + 3c + dN + e, K(N) = 3/4*a*N^2 + 3/2*b*N + 3*c + d*N + e}. The
7947factor @m{3\over4, 3/4} for @math{a} means per-crossproduct speedups in the
7948basecase code will increase the threshold since they benefit @math{M(N)} more
7949than @math{K(N)}. And conversely the @m{3\over2, 3/2} for @math{b} means
7950linear style speedups of @math{b} will increase the threshold since they
7951benefit @math{K(N)} more than @math{M(N)}. The latter can be seen for
7952instance when adding an optimized @code{mpn_sqr_diagonal} to
7953@code{mpn_sqr_basecase}. Of course all speedups reduce total time, and in
7954that sense the algorithm thresholds are merely of academic interest.
7955
7956
7957@node Toom 3-Way Multiplication, Toom 4-Way Multiplication, Karatsuba Multiplication, Multiplication Algorithms
7958@subsection Toom 3-Way Multiplication
7959@cindex Toom multiplication
7960
7961The Karatsuba formula is the simplest case of a general approach to splitting
7962inputs that leads to both Toom and FFT algorithms. A description of
7963Toom can be found in Knuth section 4.3.3, with an example 3-way
7964calculation after Theorem A@. The 3-way form used in GMP is described here.
7965
7966The operands are each considered split into 3 pieces of equal length (or the
7967most significant part 1 or 2 limbs shorter than the other two).
7968
7969@tex
7970\def\GMPbox#1#2#3{%
7971 \vbox{%
7972 \hrule \vfil
7973 \hbox to 3\GMPboxwidth {%
7974 \GMPvrule
7975 \hfil$#1$\hfil
7976 \vrule
7977 \hfil$#2$\hfil
7978 \vrule
7979 \hfil$#3$\hfil
7980 \vrule}%
7981 \vfil \hrule
7982}}
7983\GMPdisplay{%
7984\vbox{%
7985 \hbox to 3\GMPboxwidth {high \hfil low}
7986 \vskip 0.7ex
7987 \GMPbox{x_2}{x_1}{x_0}
7988 \vskip 0.5ex
7989 \GMPbox{y_2}{y_1}{y_0}
7990 \vskip 0.5ex
7991}}
7992@end tex
7993@ifnottex
7994@example
7995@group
7996 high low
7997+----------+----------+----------+
7998| x2 | x1 | x0 |
7999+----------+----------+----------+
8000
8001+----------+----------+----------+
8002| y2 | y1 | y0 |
8003+----------+----------+----------+
8004@end group
8005@end example
8006@end ifnottex
8007
8008@noindent
8009These parts are treated as the coefficients of two polynomials
8010
8011@display
8012@group
8013@m{X(t) = x_2t^2 + x_1t + x_0,
8014 X(t) = x2*t^2 + x1*t + x0}
8015@m{Y(t) = y_2t^2 + y_1t + y_0,
8016 Y(t) = y2*t^2 + y1*t + y0}
8017@end group
8018@end display
8019
8020Let @math{b} equal the power of 2 which is the size of the @ms{x,0}, @ms{x,1},
8021@ms{y,0} and @ms{y,1} pieces, i.e.@: if they're @math{k} limbs each then
8022@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.
8023With this @math{x=X(b)} and @math{y=Y(b)}.
8024
8025Let a polynomial @m{W(t)=X(t)Y(t),W(t)=X(t)*Y(t)} and suppose its coefficients
8026are
8027
8028@display
8029@m{W(t) = w_4t^4 + w_3t^3 + w_2t^2 + w_1t + w_0,
8030 W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0}
8031@end display
8032
8033The @m{w_i,w[i]} are going to be determined, and when they are they'll give
8034the final result using @math{w=W(b)}, since
8035@m{xy=X(b)Y(b),x*y=X(b)*Y(b)=W(b)}. The coefficients will be roughly
8036@math{b^2} each, and the final @math{W(b)} will be an addition like,
8037
8038@tex
8039\def\GMPbox#1#2{%
8040 \moveright #1\GMPboxwidth
8041 \vbox{%
8042 \hrule
8043 \hbox{%
8044 \GMPvrule
8045 \hbox to 2\GMPboxwidth {\hfil$#2$\hfil}%
8046 \vrule}%
8047 \hrule
8048}}
8049\GMPdisplay{%
8050\vbox{%
8051 \hbox to 6\GMPboxwidth {high \hfil low}%
8052 \vskip 0.7ex
8053 \GMPbox{0}{w_4}
8054 \vskip 0.5ex
8055 \GMPbox{1}{w_3}
8056 \vskip 0.5ex
8057 \GMPbox{2}{w_2}
8058 \vskip 0.5ex
8059 \GMPbox{3}{w_1}
8060 \vskip 0.5ex
8061 \GMPbox{4}{w_0}
8062}}
8063@end tex
8064@ifnottex
8065@example
8066@group
8067 high low
8068+-------+-------+
8069| w4 |
8070+-------+-------+
8071 +--------+-------+
8072 | w3 |
8073 +--------+-------+
8074 +--------+-------+
8075 | w2 |
8076 +--------+-------+
8077 +--------+-------+
8078 | w1 |
8079 +--------+-------+
8080 +-------+-------+
8081 | w0 |
8082 +-------+-------+
8083@end group
8084@end example
8085@end ifnottex
8086
8087The @m{w_i,w[i]} coefficients could be formed by a simple set of cross
8088products, like @m{w_4=x_2y_2,w4=x2*y2}, @m{w_3=x_2y_1+x_1y_2,w3=x2*y1+x1*y2},
8089@m{w_2=x_2y_0+x_1y_1+x_0y_2,w2=x2*y0+x1*y1+x0*y2} etc, but this would need all
8090nine @m{x_iy_j,x[i]*y[j]} for @math{i,j=0,1,2}, and would be equivalent merely
8091to a basecase multiply. Instead the following approach is used.
8092
8093@math{X(t)} and @math{Y(t)} are evaluated and multiplied at 5 points, giving
8094values of @math{W(t)} at those points. In GMP the following points are used,
8095
8096@quotation
8097@multitable {@m{t=\infty,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
8098@item Point @tab Value
8099@item @math{t=0} @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately
8100@item @math{t=1} @tab @m{(x_2+x_1+x_0)(y_2+y_1+y_0),(x2+x1+x0) * (y2+y1+y0)}
8101@item @math{t=-1} @tab @m{(x_2-x_1+x_0)(y_2-y_1+y_0),(x2-x1+x0) * (y2-y1+y0)}
8102@item @math{t=2} @tab @m{(4x_2+2x_1+x_0)(4y_2+2y_1+y_0),(4*x2+2*x1+x0) * (4*y2+2*y1+y0)}
8103@item @m{t=\infty,t=inf} @tab @m{x_2y_2,x2 * y2}, which gives @ms{w,4} immediately
8104@end multitable
8105@end quotation
8106
8107At @math{t=-1} the values can be negative and that's handled using the
8108absolute values and tracking the sign separately. At @m{t=\infty,t=inf} the
8109value is actually @m{\lim_{t\to\infty} {X(t)Y(t)\over t^4}, X(t)*Y(t)/t^4 in
8110the limit as t approaches infinity}, but it's much easier to think of as
8111simply @m{x_2y_2,x2*y2} giving @ms{w,4} immediately (much like
8112@m{x_0y_0,x0*y0} at @math{t=0} gives @ms{w,0} immediately).
8113
8114Each of the points substituted into
8115@m{W(t)=w_4t^4+\cdots+w_0,W(t)=w4*t^4+@dots{}+w0} gives a linear combination
8116of the @m{w_i,w[i]} coefficients, and the value of those combinations has just
8117been calculated.
8118
8119@tex
8120\GMPdisplay{%
8121$\matrix{%
8122W(0) & = & & & & & & & & & w_0 \cr
8123W(1) & = & w_4 & + & w_3 & + & w_2 & + & w_1 & + & w_0 \cr
8124W(-1) & = & w_4 & - & w_3 & + & w_2 & - & w_1 & + & w_0 \cr
8125W(2) & = & 16w_4 & + & 8w_3 & + & 4w_2 & + & 2w_1 & + & w_0 \cr
8126W(\infty) & = & w_4 \cr
8127}$}
8128@end tex
8129@ifnottex
8130@example
8131@group
8132W(0) = w0
8133W(1) = w4 + w3 + w2 + w1 + w0
8134W(-1) = w4 - w3 + w2 - w1 + w0
8135W(2) = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0
8136W(inf) = w4
8137@end group
8138@end example
8139@end ifnottex
8140
8141This is a set of five equations in five unknowns, and some elementary linear
8142algebra quickly isolates each @m{w_i,w[i]}. This involves adding or
8143subtracting one @math{W(t)} value from another, and a couple of divisions by
8144powers of 2 and one division by 3, the latter using the special
8145@code{mpn_divexact_by3} (@pxref{Exact Division}).
8146
8147The conversion of @math{W(t)} values to the coefficients is interpolation. A
8148polynomial of degree 4 like @math{W(t)} is uniquely determined by values known
8149at 5 different points. The points are arbitrary and can be chosen to make the
8150linear equations come out with a convenient set of steps for quickly isolating
8151the @m{w_i,w[i]}.
8152
8153Squaring follows the same procedure as multiplication, but there's only one
8154@math{X(t)} and it's evaluated at the 5 points, and those values squared to
8155give values of @math{W(t)}. The interpolation is then identical, and in fact
8156the same @code{toom_interpolate_5pts} subroutine is used for both squaring and
8157multiplying.
8158
8159Toom-3 is asymptotically @math{O(N^@W{1.465})}, the exponent being
8160@m{\log5/\log3,log(5)/log(3)}, representing 5 recursive multiplies of 1/3 the
8161original size each. This is an improvement over Karatsuba at
8162@math{O(N^@W{1.585})}, though Toom does more work in the evaluation and
8163interpolation and so it only realizes its advantage above a certain size.
8164
8165Near the crossover between Toom-3 and Karatsuba there's generally a range of
8166sizes where the difference between the two is small.
8167@code{MUL_TOOM33_THRESHOLD} is a somewhat arbitrary point in that range and
8168successive runs of the tune program can give different values due to small
8169variations in measuring. A graph of time versus size for the two shows the
8170effect, see @file{tune/README}.
8171
8172At the fairly small sizes where the Toom-3 thresholds occur it's worth
8173remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be
8174expected to make accurate predictions, due of course to the big influence of
8175all sorts of overheads, and the fact that only a few recursions of each are
8176being performed. Even at large sizes there's a good chance machine dependent
8177effects like cache architecture will mean actual performance deviates from
8178what might be predicted.
8179
8180The formula given for the Karatsuba algorithm (@pxref{Karatsuba
8181Multiplication}) has an equivalent for Toom-3 involving only five multiplies,
8182but this would be complicated and unenlightening.
8183
8184An alternate view of Toom-3 can be found in Zuras (@pxref{References}), using
8185a vector to represent the @math{x} and @math{y} splits and a matrix
8186multiplication for the evaluation and interpolation stages. The matrix
8187inverses are not meant to be actually used, and they have elements with values
8188much greater than in fact arise in the interpolation steps. The diagram shown
8189for the 3-way is attractive, but again doesn't have to be implemented that way
8190and for example with a bit of rearrangement just one division by 6 can be
8191done.
8192
8193
8194@node Toom 4-Way Multiplication, Higher degree Toom'n'half, Toom 3-Way Multiplication, Multiplication Algorithms
8195@subsection Toom 4-Way Multiplication
8196@cindex Toom multiplication
8197
8198Karatsuba and Toom-3 split the operands into 2 and 3 coefficients,
8199respectively. Toom-4 analogously splits the operands into 4 coefficients.
8200Using the notation from the section on Toom-3 multiplication, we form two
8201polynomials:
8202
8203@display
8204@group
8205@m{X(t) = x_3t^3 + x_2t^2 + x_1t + x_0,
8206 X(t) = x3*t^3 + x2*t^2 + x1*t + x0}
8207@m{Y(t) = y_3t^3 + y_2t^2 + y_1t + y_0,
8208 Y(t) = y3*t^3 + y2*t^2 + y1*t + y0}
8209@end group
8210@end display
8211
8212@math{X(t)} and @math{Y(t)} are evaluated and multiplied at 7 points, giving
8213values of @math{W(t)} at those points. In GMP the following points are used,
8214
8215@quotation
8216@multitable {@m{t=-1/2,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}
8217@item Point @tab Value
8218@item @math{t=0} @tab @m{x_0y_0,x0 * y0}, which gives @ms{w,0} immediately
8219@item @math{t=1/2} @tab @m{(x_3+2x_2+4x_1+8x_0)(y_3+2y_2+4y_1+8y_0),(x3+2*x2+4*x1+8*x0) * (y3+2*y2+4*y1+8*y0)}
8220@item @math{t=-1/2} @tab @m{(-x_3+2x_2-4x_1+8x_0)(-y_3+2y_2-4y_1+8y_0),(-x3+2*x2-4*x1+8*x0) * (-y3+2*y2-4*y1+8*y0)}
8221@item @math{t=1} @tab @m{(x_3+x_2+x_1+x_0)(y_3+y_2+y_1+y_0),(x3+x2+x1+x0) * (y3+y2+y1+y0)}
8222@item @math{t=-1} @tab @m{(-x_3+x_2-x_1+x_0)(-y_3+y_2-y_1+y_0),(-x3+x2-x1+x0) * (-y3+y2-y1+y0)}
8223@item @math{t=2} @tab @m{(8x_3+4x_2+2x_1+x_0)(8y_3+4y_2+2y_1+y_0),(8*x3+4*x2+2*x1+x0) * (8*y3+4*y2+2*y1+y0)}
8224@item @m{t=\infty,t=inf} @tab @m{x_3y_3,x3 * y3}, which gives @ms{w,6} immediately
8225@end multitable
8226@end quotation
8227
8228The number of additions and subtractions for Toom-4 is much larger than for Toom-3.
8229But several subexpressions occur multiple times, for example @m{x_2+x_0,x2+x0}, occurs
8230for both @math{t=1} and @math{t=-1}.
8231
8232Toom-4 is asymptotically @math{O(N^@W{1.404})}, the exponent being
8233@m{\log7/\log4,log(7)/log(4)}, representing 7 recursive multiplies of 1/4 the
8234original size each.
8235
8236
8237@node Higher degree Toom'n'half, FFT Multiplication, Toom 4-Way Multiplication, Multiplication Algorithms
8238@subsection Higher degree Toom'n'half
8239@cindex Toom multiplication
8240
8241The Toom algorithms described above (@pxref{Toom 3-Way Multiplication},
8242@pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary
8243number of pieces. In general a split of two equally long operands into
8244@math{r} pieces leads to evaluations and pointwise multiplications done at
8245@m{2r-1,2*r-1} points. To fully exploit symmetries it would be better to have
8246a multiple of 4 points, that's why for higher degree Toom'n'half is used.
8247
8248Toom'n'half means that the existence of one more piece is considered for a
8249single operand. It can be virtual, i.e. zero, or real, when the two operand
8250are not exactly balanced. By choosing an even @math{r},
8251Toom-@m{r{1\over2},r+1/2} requires @math{2r} points, a multiple of four.
8252
8253The quadruplets of points include 0, @m{\infty,inf}, +1, -1 and
8254@m{\pm2^i,+-2^i}, @m{\pm2^{-i},+-2^-i} . Each of them giving shortcuts for the
8255evaluation phase and for some steps in the interpolation phase. Further tricks
8256are used to reduce the memory footprint of the whole multiplication algorithm
8257to a memory buffer equal in size to the result of the product.
8258
8259Current GMP uses both Toom-6'n'half and Toom-8'n'half.
8260
8261
8262@node FFT Multiplication, Other Multiplication, Higher degree Toom'n'half, Multiplication Algorithms
8263@subsection FFT Multiplication
8264@cindex FFT multiplication
8265@cindex Fast Fourier Transform
8266
8267At large to very large sizes a Fermat style FFT multiplication is used,
8268following Sch@"onhage and Strassen (@pxref{References}). Descriptions of FFTs
8269in various forms can be found in many textbooks, for instance Knuth section
82704.3.3 part C or Lipson chapter IX@. A brief description of the form used in
8271GMP is given here.
8272
8273The multiplication done is @m{xy \bmod 2^N+1, x*y mod 2^N+1}, for a given
8274@math{N}. A full product @m{xy,x*y} is obtained by choosing @m{N \ge
8275\mathop{\rm bits}(x)+\mathop{\rm bits}(y), N>=bits(x)+bits(y)} and padding
8276@math{x} and @math{y} with high zero limbs. The modular product is the native
8277form for the algorithm, so padding to get a full product is unavoidable.
8278
8279The algorithm follows a split, evaluate, pointwise multiply, interpolate and
8280combine similar to that described above for Karatsuba and Toom-3. A @math{k}
8281parameter controls the split, with an FFT-@math{k} splitting into @math{2^k}
8282pieces of @math{M=N/2^k} bits each. @math{N} must be a multiple of
8283@m{2^k\times@code{mp\_bits\_per\_limb}, (2^k)*@nicode{mp_bits_per_limb}} so
8284the split falls on limb boundaries, avoiding bit shifts in the split and
8285combine stages.
8286
8287The evaluations, pointwise multiplications, and interpolation, are all done
8288modulo @m{2^{N'}+1, 2^N'+1} where @math{N'} is @math{2M+k+3} rounded up to a
8289multiple of @math{2^k} and of @code{mp_bits_per_limb}. The results of
8290interpolation will be the following negacyclic convolution of the input
8291pieces, and the choice of @math{N'} ensures these sums aren't truncated.
8292@tex
8293$$ w_n = \sum_{{i+j = b2^k+n}\atop{b=0,1}} (-1)^b x_i y_j $$
8294@end tex
8295@ifnottex
8296
8297@example
8298 ---
8299 \ b
8300w[n] = / (-1) * x[i] * y[j]
8301 ---
8302 i+j==b*2^k+n
8303 b=0,1
8304@end example
8305
8306@end ifnottex
8307The points used for the evaluation are @math{g^i} for @math{i=0} to
8308@math{2^k-1} where @m{g=2^{2N'/2^k}, g=2^(2N'/2^k)}. @math{g} is a
8309@m{2^k,2^k'}th root of unity mod @m{2^{N'}+1,2^N'+1}, which produces necessary
8310cancellations at the interpolation stage, and it's also a power of 2 so the
8311fast Fourier transforms used for the evaluation and interpolation do only
8312shifts, adds and negations.
8313
8314The pointwise multiplications are done modulo @m{2^{N'}+1, 2^N'+1} and either
8315recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or
8316basecase), whichever is optimal at the size @math{N'}. The interpolation is
8317an inverse fast Fourier transform. The resulting set of sums of @m{x_iy_j,
8318x[i]*y[j]} are added at appropriate offsets to give the final result.
8319
8320Squaring is the same, but @math{x} is the only input so it's one transform at
8321the evaluate stage and the pointwise multiplies are squares. The
8322interpolation is the same.
8323
8324For a mod @math{2^N+1} product, an FFT-@math{k} is an @m{O(N^{k/(k-1)}),
8325O(N^(k/(k-1)))} algorithm, the exponent representing @math{2^k} recursed
8326modular multiplies each @m{1/2^{k-1},1/2^(k-1)} the size of the original.
8327Each successive @math{k} is an asymptotic improvement, but overheads mean each
8328is only faster at bigger and bigger sizes. In the code, @code{MUL_FFT_TABLE}
8329and @code{SQR_FFT_TABLE} are the thresholds where each @math{k} is used. Each
8330new @math{k} effectively swaps some multiplying for some shifts, adds and
8331overheads.
8332
8333A mod @math{2^N+1} product can be formed with a normal
8334@math{N@cross{}N@rightarrow{}2N} bit multiply plus a subtraction, so an FFT
8335and Toom-3 etc can be compared directly. A @math{k=4} FFT at
8336@math{O(N^@W{1.333})} can be expected to be the first faster than Toom-3 at
8337@math{O(N^@W{1.465})}. In practice this is what's found, with
8338@code{MUL_FFT_MODF_THRESHOLD} and @code{SQR_FFT_MODF_THRESHOLD} being between
8339300 and 1000 limbs, depending on the CPU@. So far it's been found that only
8340very large FFTs recurse into pointwise multiplies above these sizes.
8341
8342When an FFT is to give a full product, the change of @math{N} to @math{2N}
8343doesn't alter the theoretical complexity for a given @math{k}, but for the
8344purposes of considering where an FFT might be first used it can be assumed
8345that the FFT is recursing into a normal multiply and that on that basis it's
8346doing @math{2^k} recursed multiplies each @m{1/2^{k-2},1/2^(k-2)} the size of
8347the inputs, making it @m{O(N^{k/(k-2)}), O(N^(k/(k-2)))}. This would mean
8348@math{k=7} at @math{O(N^@W{1.4})} would be the first FFT faster than Toom-3.
8349In practice @code{MUL_FFT_THRESHOLD} and @code{SQR_FFT_THRESHOLD} have been
8350found to be in the @math{k=8} range, somewhere between 3000 and 10000 limbs.
8351
8352The way @math{N} is split into @math{2^k} pieces and then @math{2M+k+3} is
8353rounded up to a multiple of @math{2^k} and @code{mp_bits_per_limb} means that
8354when @math{2^k@ge{}@nicode{mp\_bits\_per\_limb}} the effective @math{N} is a
8355multiple of @m{2^{2k-1},2^(2k-1)} bits. The @math{+k+3} means some values of
8356@math{N} just under such a multiple will be rounded to the next. The
8357complexity calculations above assume that a favourable size is used, meaning
8358one which isn't padded through rounding, and it's also assumed that the extra
8359@math{+k+3} bits are negligible at typical FFT sizes.
8360
8361The practical effect of the @m{2^{2k-1},2^(2k-1)} constraint is to introduce a
8362step-effect into measured speeds. For example @math{k=8} will round @math{N}
8363up to a multiple of 32768 bits, so for a 32-bit limb there'll be 512 limb
8364groups of sizes for which @code{mpn_mul_n} runs at the same speed. Or for
8365@math{k=9} groups of 2048 limbs, @math{k=10} groups of 8192 limbs, etc. In
8366practice it's been found each @math{k} is used at quite small multiples of its
8367size constraint and so the step effect is quite noticeable in a time versus
8368size graph.
8369
8370The threshold determinations currently measure at the mid-points of size
8371steps, but this is sub-optimal since at the start of a new step it can happen
8372that it's better to go back to the previous @math{k} for a while. Something
8373more sophisticated for @code{MUL_FFT_TABLE} and @code{SQR_FFT_TABLE} will be
8374needed.
8375
8376
8377@node Other Multiplication, Unbalanced Multiplication, FFT Multiplication, Multiplication Algorithms
8378@subsection Other Multiplication
8379@cindex Toom multiplication
8380
8381The Toom algorithms described above (@pxref{Toom 3-Way Multiplication},
8382@pxref{Toom 4-Way Multiplication}) generalizes to split into an arbitrary
8383number of pieces, as per Knuth section 4.3.3 algorithm C@. This is not
8384currently used. The notes here are merely for interest.
8385
8386In general a split into @math{r+1} pieces is made, and evaluations and
8387pointwise multiplications done at @m{2r+1,2*r+1} points. A 4-way split does 7
8388pointwise multiplies, 5-way does 9, etc. Asymptotically an @math{(r+1)}-way
8389algorithm is @m{O(N^{log(2r+1)/log(r+1)}), O(N^(log(2*r+1)/log(r+1)))}. Only
8390the pointwise multiplications count towards big-@math{O} complexity, but the
8391time spent in the evaluate and interpolate stages grows with @math{r} and has
8392a significant practical impact, with the asymptotic advantage of each @math{r}
8393realized only at bigger and bigger sizes. The overheads grow as
8394@m{O(Nr),O(N*r)}, whereas in an @math{r=2^k} FFT they grow only as @m{O(N \log
8395r), O(N*log(r))}.
8396
8397Knuth algorithm C evaluates at points 0,1,2,@dots{},@m{2r,2*r}, but exercise 4
8398uses @math{-r},@dots{},0,@dots{},@math{r} and the latter saves some small
8399multiplies in the evaluate stage (or rather trades them for additions), and
8400has a further saving of nearly half the interpolate steps. The idea is to
8401separate odd and even final coefficients and then perform algorithm C steps C7
8402and C8 on them separately. The divisors at step C7 become @math{j^2} and the
8403multipliers at C8 become @m{2tj-j^2,2*t*j-j^2}.
8404
8405Splitting odd and even parts through positive and negative points can be
8406thought of as using @math{-1} as a square root of unity. If a 4th root of
8407unity was available then a further split and speedup would be possible, but no
8408such root exists for plain integers. Going to complex integers with
8409@m{i=\sqrt{-1}, i=sqrt(-1)} doesn't help, essentially because in Cartesian
8410form it takes three real multiplies to do a complex multiply. The existence
8411of @m{2^k,2^k'}th roots of unity in a suitable ring or field lets the fast
8412Fourier transform keep splitting and get to @m{O(N \log r), O(N*log(r))}.
8413
8414Floating point FFTs use complex numbers approximating Nth roots of unity.
8415Some processors have special support for such FFTs. But these are not used in
8416GMP since it's very difficult to guarantee an exact result (to some number of
8417bits). An occasional difference of 1 in the last bit might not matter to a
8418typical signal processing algorithm, but is of course of vital importance to
8419GMP.
8420
8421
8422@node Unbalanced Multiplication, , Other Multiplication, Multiplication Algorithms
8423@subsection Unbalanced Multiplication
8424@cindex Unbalanced multiplication
8425
8426Multiplication of operands with different sizes, both below
8427@code{MUL_TOOM22_THRESHOLD} are done with plain schoolbook multiplication
8428(@pxref{Basecase Multiplication}).
8429
8430For really large operands, we invoke FFT directly.
8431
8432For operands between these sizes, we use Toom inspired algorithms suggested by
8433Alberto Zanoni and Marco Bodrato. The idea is to split the operands into
8434polynomials of different degree. GMP currently splits the smaller operand
8435onto 2 coefficients, i.e., a polynomial of degree 1, but the larger operand
8436can be split into 2, 3, or 4 coefficients, i.e., a polynomial of degree 1 to
84373.
8438
8439@c FIXME: This is mighty ugly, but a cleaner @need triggers texinfo bugs that
8440@c screws up layout here and there in the rest of the manual.
8441@c @tex
8442@c \goodbreak
8443@c @end tex
8444@node Division Algorithms, Greatest Common Divisor Algorithms, Multiplication Algorithms, Algorithms
8445@section Division Algorithms
8446@cindex Division algorithms
8447
8448@menu
8449* Single Limb Division::
8450* Basecase Division::
8451* Divide and Conquer Division::
8452* Block-Wise Barrett Division::
8453* Exact Division::
8454* Exact Remainder::
8455* Small Quotient Division::
8456@end menu
8457
8458
8459@node Single Limb Division, Basecase Division, Division Algorithms, Division Algorithms
8460@subsection Single Limb Division
8461
8462N@cross{}1 division is implemented using repeated 2@cross{}1 divisions from
8463high to low, either with a hardware divide instruction or a multiplication by
8464inverse, whichever is best on a given CPU.
8465
8466The multiply by inverse follows ``Improved division by invariant integers'' by
8467M@"oller and Granlund (@pxref{References}) and is implemented as
8468@code{udiv_qrnnd_preinv} in @file{gmp-impl.h}. The idea is to have a
8469fixed-point approximation to @math{1/d} (see @code{invert_limb}) and then
8470multiply by the high limb (plus one bit) of the dividend to get a quotient
8471@math{q}. With @math{d} normalized (high bit set), @math{q} is no more than 1
8472too small. Subtracting @m{qd,q*d} from the dividend gives a remainder, and
8473reveals whether @math{q} or @math{q-1} is correct.
8474
8475The result is a division done with two multiplications and four or five
8476arithmetic operations. On CPUs with low latency multipliers this can be much
8477faster than a hardware divide, though the cost of calculating the inverse at
8478the start may mean it's only better on inputs bigger than say 4 or 5 limbs.
8479
8480When a divisor must be normalized, either for the generic C
8481@code{__udiv_qrnnd_c} or the multiply by inverse, the division performed is
8482actually @m{a2^k,a*2^k} by @m{d2^k,d*2^k} where @math{a} is the dividend and
8483@math{k} is the power necessary to have the high bit of @m{d2^k,d*2^k} set.
8484The bit shifts for the dividend are usually accomplished ``on the fly''
8485meaning by extracting the appropriate bits at each step. Done this way the
8486quotient limbs come out aligned ready to store. When only the remainder is
8487wanted, an alternative is to take the dividend limbs unshifted and calculate
8488@m{r = a \bmod d2^k, r = a mod d*2^k} followed by an extra final step @m{r2^k
8489\bmod d2^k, r*2^k mod d*2^k}. This can help on CPUs with poor bit shifts or
8490few registers.
8491
8492The multiply by inverse can be done two limbs at a time. The calculation is
8493basically the same, but the inverse is two limbs and the divisor treated as if
8494padded with a low zero limb. This means more work, since the inverse will
8495need a 2@cross{}2 multiply, but the four 1@cross{}1s to do that are
8496independent and can therefore be done partly or wholly in parallel. Likewise
8497for a 2@cross{}1 calculating @m{qd,q*d}. The net effect is to process two
8498limbs with roughly the same two multiplies worth of latency that one limb at a
8499time gives. This extends to 3 or 4 limbs at a time, though the extra work to
8500apply the inverse will almost certainly soon reach the limits of multiplier
8501throughput.
8502
8503A similar approach in reverse can be taken to process just half a limb at a
8504time if the divisor is only a half limb. In this case the 1@cross{}1 multiply
8505for the inverse effectively becomes two @m{{1\over2}\times1, (1/2)x1} for each
8506limb, which can be a saving on CPUs with a fast half limb multiply, or in fact
8507if the only multiply is a half limb, and especially if it's not pipelined.
8508
8509
8510@node Basecase Division, Divide and Conquer Division, Single Limb Division, Division Algorithms
8511@subsection Basecase Division
8512
8513Basecase N@cross{}M division is like long division done by hand, but in base
8514@m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 2^mp_bits_per_limb}. See Knuth
8515section 4.3.1 algorithm D, and @file{mpn/generic/sb_divrem_mn.c}.
8516
8517Briefly stated, while the dividend remains larger than the divisor, a high
8518quotient limb is formed and the N@cross{}1 product @m{qd,q*d} subtracted at
8519the top end of the dividend. With a normalized divisor (most significant bit
8520set), each quotient limb can be formed with a 2@cross{}1 division and a
85211@cross{}1 multiplication plus some subtractions. The 2@cross{}1 division is
8522by the high limb of the divisor and is done either with a hardware divide or a
8523multiply by inverse (the same as in @ref{Single Limb Division}) whichever is
8524faster. Such a quotient is sometimes one too big, requiring an addback of the
8525divisor, but that happens rarely.
8526
8527With Q=N@minus{}M being the number of quotient limbs, this is an
8528@m{O(QM),O(Q*M)} algorithm and will run at a speed similar to a basecase
8529Q@cross{}M multiplication, differing in fact only in the extra multiply and
8530divide for each of the Q quotient limbs.
8531
8532
8533@node Divide and Conquer Division, Block-Wise Barrett Division, Basecase Division, Division Algorithms
8534@subsection Divide and Conquer Division
8535
8536For divisors larger than @code{DC_DIV_QR_THRESHOLD}, division is done by dividing.
8537Or to be precise by a recursive divide and conquer algorithm based on work by
8538Moenck and Borodin, Jebelean, and Burnikel and Ziegler (@pxref{References}).
8539
8540The algorithm consists essentially of recognising that a 2N@cross{}N division
8541can be done with the basecase division algorithm (@pxref{Basecase Division}),
8542but using N/2 limbs as a base, not just a single limb. This way the
8543multiplications that arise are (N/2)@cross{}(N/2) and can take advantage of
8544Karatsuba and higher multiplication algorithms (@pxref{Multiplication
8545Algorithms}). The two ``digits'' of the quotient are formed by recursive
8546N@cross{}(N/2) divisions.
8547
8548If the (N/2)@cross{}(N/2) multiplies are done with a basecase multiplication
8549then the work is about the same as a basecase division, but with more function
8550call overheads and with some subtractions separated from the multiplies.
8551These overheads mean that it's only when N/2 is above
8552@code{MUL_TOOM22_THRESHOLD} that divide and conquer is of use.
8553
8554@code{DC_DIV_QR_THRESHOLD} is based on the divisor size N, so it will be somewhere
8555above twice @code{MUL_TOOM22_THRESHOLD}, but how much above depends on the
8556CPU@. An optimized @code{mpn_mul_basecase} can lower @code{DC_DIV_QR_THRESHOLD} a
8557little by offering a ready-made advantage over repeated @code{mpn_submul_1}
8558calls.
8559
8560Divide and conquer is asymptotically @m{O(M(N)\log N),O(M(N)*log(N))} where
8561@math{M(N)} is the time for an N@cross{}N multiplication done with FFTs. The
8562actual time is a sum over multiplications of the recursed sizes, as can be
8563seen near the end of section 2.2 of Burnikel and Ziegler. For example, within
8564the Toom-3 range, divide and conquer is @m{2.63M(N), 2.63*M(N)}. With higher
8565algorithms the @math{M(N)} term improves and the multiplier tends to @m{\log
8566N, log(N)}. In practice, at moderate to large sizes, a 2N@cross{}N division
8567is about 2 to 4 times slower than an N@cross{}N multiplication.
8568
8569
8570@node Block-Wise Barrett Division, Exact Division, Divide and Conquer Division, Division Algorithms
8571@subsection Block-Wise Barrett Division
8572
8573For the largest divisions, a block-wise Barrett division algorithm is used.
8574Here, the divisor is inverted to a precision determined by the relative size of
8575the dividend and divisor. Blocks of quotient limbs are then generated by
8576multiplying blocks from the dividend by the inverse.
8577
8578Our block-wise algorithm computes a smaller inverse than in the plain Barrett
8579algorithm. For a @math{2n/n} division, the inverse will be just @m{\lceil n/2
8580\rceil, ceil(n/2)} limbs.
8581
8582
8583@node Exact Division, Exact Remainder, Block-Wise Barrett Division, Division Algorithms
8584@subsection Exact Division
8585
8586
8587A so-called exact division is when the dividend is known to be an exact
8588multiple of the divisor. Jebelean's exact division algorithm uses this
8589knowledge to make some significant optimizations (@pxref{References}).
8590
8591The idea can be illustrated in decimal for example with 368154 divided by
8592543. Because the low digit of the dividend is 4, the low digit of the
8593quotient must be 8. This is arrived at from @m{4 \mathord{\times} 7 \bmod 10,
85944*7 mod 10}, using the fact 7 is the modular inverse of 3 (the low digit of
8595the divisor), since @m{3 \mathord{\times} 7 \mathop{\equiv} 1 \bmod 10, 3*7
8596@equiv{} 1 mod 10}. So @m{8\mathord{\times}543 = 4344,8*543=4344} can be
8597subtracted from the dividend leaving 363810. Notice the low digit has become
8598zero.
8599
8600The procedure is repeated at the second digit, with the next quotient digit 7
8601(@m{1 \mathord{\times} 7 \bmod 10, 7 @equiv{} 1*7 mod 10}), subtracting
8602@m{7\mathord{\times}543 = 3801,7*543=3801}, leaving 325800. And finally at
8603the third digit with quotient digit 6 (@m{8 \mathord{\times} 7 \bmod 10, 8*7
8604mod 10}), subtracting @m{6\mathord{\times}543 = 3258,6*543=3258} leaving 0.
8605So the quotient is 678.
8606
8607Notice however that the multiplies and subtractions don't need to extend past
8608the low three digits of the dividend, since that's enough to determine the
8609three quotient digits. For the last quotient digit no subtraction is needed
8610at all. On a 2N@cross{}N division like this one, only about half the work of
8611a normal basecase division is necessary.
8612
8613For an N@cross{}M exact division producing Q=N@minus{}M quotient limbs, the
8614saving over a normal basecase division is in two parts. Firstly, each of the
8615Q quotient limbs needs only one multiply, not a 2@cross{}1 divide and
8616multiply. Secondly, the crossproducts are reduced when @math{Q>M} to
8617@m{QM-M(M+1)/2,Q*M-M*(M+1)/2}, or when @math{Q@le{}M} to @m{Q(Q-1)/2,
8618Q*(Q-1)/2}. Notice the savings are complementary. If Q is big then many
8619divisions are saved, or if Q is small then the crossproducts reduce to a small
8620number.
8621
8622The modular inverse used is calculated efficiently by @code{binvert_limb} in
8623@file{gmp-impl.h}. This does four multiplies for a 32-bit limb, or six for a
862464-bit limb. @file{tune/modlinv.c} has some alternate implementations that
8625might suit processors better at bit twiddling than multiplying.
8626
8627The sub-quadratic exact division described by Jebelean in ``Exact Division
8628with Karatsuba Complexity'' is not currently implemented. It uses a
8629rearrangement similar to the divide and conquer for normal division
8630(@pxref{Divide and Conquer Division}), but operating from low to high. A
8631further possibility not currently implemented is ``Bidirectional Exact Integer
8632Division'' by Krandick and Jebelean which forms quotient limbs from both the
8633high and low ends of the dividend, and can halve once more the number of
8634crossproducts needed in a 2N@cross{}N division.
8635
8636A special case exact division by 3 exists in @code{mpn_divexact_by3},
8637supporting Toom-3 multiplication and @code{mpq} canonicalizations. It forms
8638quotient digits with a multiply by the modular inverse of 3 (which is
8639@code{0xAA..AAB}) and uses two comparisons to determine a borrow for the next
8640limb. The multiplications don't need to be on the dependent chain, as long as
8641the effect of the borrows is applied, which can help chips with pipelined
8642multipliers.
8643
8644
8645@node Exact Remainder, Small Quotient Division, Exact Division, Division Algorithms
8646@subsection Exact Remainder
8647@cindex Exact remainder
8648
8649If the exact division algorithm is done with a full subtraction at each stage
8650and the dividend isn't a multiple of the divisor, then low zero limbs are
8651produced but with a remainder in the high limbs. For dividend @math{a},
8652divisor @math{d}, quotient @math{q}, and @m{b = 2
8653\GMPraise{@code{mp\_bits\_per\_limb}}, b = 2^mp_bits_per_limb}, this remainder
8654@math{r} is of the form
8655@tex
8656$$ a = qd + r b^n $$
8657@end tex
8658@ifnottex
8659
8660@example
8661a = q*d + r*b^n
8662@end example
8663
8664@end ifnottex
8665@math{n} represents the number of zero limbs produced by the subtractions,
8666that being the number of limbs produced for @math{q}. @math{r} will be in the
8667range @math{0@le{}r<d} and can be viewed as a remainder, but one shifted up by
8668a factor of @math{b^n}.
8669
8670Carrying out full subtractions at each stage means the same number of cross
8671products must be done as a normal division, but there's still some single limb
8672divisions saved. When @math{d} is a single limb some simplifications arise,
8673providing good speedups on a number of processors.
8674
8675The functions @code{mpn_divexact_by3}, @code{mpn_modexact_1_odd} and the
8676internal @code{mpn_redc_X} functions differ subtly in how they return @math{r},
8677leading to some negations in the above formula, but all are essentially the
8678same.
8679
8680@cindex Divisibility algorithm
8681@cindex Congruence algorithm
8682Clearly @math{r} is zero when @math{a} is a multiple of @math{d}, and this
8683leads to divisibility or congruence tests which are potentially more efficient
8684than a normal division.
8685
8686The factor of @math{b^n} on @math{r} can be ignored in a GCD when @math{d} is
8687odd, hence the use of @code{mpn_modexact_1_odd} by @code{mpn_gcd_1} and
8688@code{mpz_kronecker_ui} etc (@pxref{Greatest Common Divisor Algorithms}).
8689
8690Montgomery's REDC method for modular multiplications uses operands of the form
8691of @m{xb^{-n}, x*b^-n} and @m{yb^{-n}, y*b^-n} and on calculating @m{(xb^{-n})
8692(yb^{-n}), (x*b^-n)*(y*b^-n)} uses the factor of @math{b^n} in the exact
8693remainder to reach a product in the same form @m{(xy)b^{-n}, (x*y)*b^-n}
8694(@pxref{Modular Powering Algorithm}).
8695
8696Notice that @math{r} generally gives no useful information about the ordinary
8697remainder @math{a @bmod d} since @math{b^n @bmod d} could be anything. If
8698however @math{b^n @equiv{} 1 @bmod d}, then @math{r} is the negative of the
8699ordinary remainder. This occurs whenever @math{d} is a factor of
8700@math{b^n-1}, as for example with 3 in @code{mpn_divexact_by3}. For a 32 or
870164 bit limb other such factors include 5, 17 and 257, but no particular use
8702has been found for this.
8703
8704
8705@node Small Quotient Division, , Exact Remainder, Division Algorithms
8706@subsection Small Quotient Division
8707
8708An N@cross{}M division where the number of quotient limbs Q=N@minus{}M is
8709small can be optimized somewhat.
8710
8711An ordinary basecase division normalizes the divisor by shifting it to make
8712the high bit set, shifting the dividend accordingly, and shifting the
8713remainder back down at the end of the calculation. This is wasteful if only a
8714few quotient limbs are to be formed. Instead a division of just the top
8715@m{\rm2Q,2*Q} limbs of the dividend by the top Q limbs of the divisor can be
8716used to form a trial quotient. This requires only those limbs normalized, not
8717the whole of the divisor and dividend.
8718
8719A multiply and subtract then applies the trial quotient to the M@minus{}Q
8720unused limbs of the divisor and N@minus{}Q dividend limbs (which includes Q
8721limbs remaining from the trial quotient division). The starting trial
8722quotient can be 1 or 2 too big, but all cases of 2 too big and most cases of 1
8723too big are detected by first comparing the most significant limbs that will
8724arise from the subtraction. An addback is done if the quotient still turns
8725out to be 1 too big.
8726
8727This whole procedure is essentially the same as one step of the basecase
8728algorithm done in a Q limb base, though with the trial quotient test done only
8729with the high limbs, not an entire Q limb ``digit'' product. The correctness
8730of this weaker test can be established by following the argument of Knuth
8731section 4.3.1 exercise 20 but with the @m{v_2 \GMPhat q > b \GMPhat r
8732+ u_2, v2*q>b*r+u2} condition appropriately relaxed.
8733
8734
8735@need 1000
8736@node Greatest Common Divisor Algorithms, Powering Algorithms, Division Algorithms, Algorithms
8737@section Greatest Common Divisor
8738@cindex Greatest common divisor algorithms
8739@cindex GCD algorithms
8740
8741@menu
8742* Binary GCD::
8743* Lehmer's Algorithm::
8744* Subquadratic GCD::
8745* Extended GCD::
8746* Jacobi Symbol::
8747@end menu
8748
8749
8750@node Binary GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms, Greatest Common Divisor Algorithms
8751@subsection Binary GCD
8752
8753At small sizes GMP uses an @math{O(N^2)} binary style GCD@. This is described
8754in many textbooks, for example Knuth section 4.5.2 algorithm B@. It simply
8755consists of successively reducing odd operands @math{a} and @math{b} using
8756
8757@quotation
8758@math{a,b = @abs{}(a-b),@min{}(a,b)} @*
8759strip factors of 2 from @math{a}
8760@end quotation
8761
8762The Euclidean GCD algorithm, as per Knuth algorithms E and A, repeatedly
8763computes the quotient @m{q = \lfloor a/b \rfloor, q = floor(a/b)} and replaces
8764@math{a,b} by @math{v, u - q v}. The binary algorithm has so far been found to
8765be faster than the Euclidean algorithm everywhere. One reason the binary
8766method does well is that the implied quotient at each step is usually small,
8767so often only one or two subtractions are needed to get the same effect as a
8768division. Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth
8769section 4.5.3 Theorem E.
8770
8771When the implied quotient is large, meaning @math{b} is much smaller than
8772@math{a}, then a division is worthwhile. This is the basis for the initial
8773@math{a @bmod b} reductions in @code{mpn_gcd} and @code{mpn_gcd_1} (the latter
8774for both N@cross{}1 and 1@cross{}1 cases). But after that initial reduction,
8775big quotients occur too rarely to make it worth checking for them.
8776
8777@sp 1
8778The final @math{1@cross{}1} GCD in @code{mpn_gcd_1} is done in the generic C
8779code as described above. For two N-bit operands, the algorithm takes about
87800.68 iterations per bit. For optimum performance some attention needs to be
8781paid to the way the factors of 2 are stripped from @math{a}.
8782
8783Firstly it may be noted that in twos complement the number of low zero bits on
8784@math{a-b} is the same as @math{b-a}, so counting or testing can begin on
8785@math{a-b} without waiting for @math{@abs{}(a-b)} to be determined.
8786
8787A loop stripping low zero bits tends not to branch predict well, since the
8788condition is data dependent. But on average there's only a few low zeros, so
8789an option is to strip one or two bits arithmetically then loop for more (as
8790done for AMD K6). Or use a lookup table to get a count for several bits then
8791loop for more (as done for AMD K7). An alternative approach is to keep just
8792one of @math{a} or @math{b} odd and iterate
8793
8794@quotation
8795@math{a,b = @abs{}(a-b), @min{}(a,b)} @*
8796@math{a = a/2} if even @*
8797@math{b = b/2} if even
8798@end quotation
8799
8800This requires about 1.25 iterations per bit, but stripping of a single bit at
8801each step avoids any branching. Repeating the bit strip reduces to about 0.9
8802iterations per bit, which may be a worthwhile tradeoff.
8803
8804Generally with the above approaches a speed of perhaps 6 cycles per bit can be
8805achieved, which is still not terribly fast with for instance a 64-bit GCD
8806taking nearly 400 cycles. It's this sort of time which means it's not usually
8807advantageous to combine a set of divisibility tests into a GCD.
8808
8809Currently, the binary algorithm is used for GCD only when @math{N < 3}.
8810
8811@node Lehmer's Algorithm, Subquadratic GCD, Binary GCD, Greatest Common Divisor Algorithms
8812@comment node-name, next, previous, up
8813@subsection Lehmer's algorithm
8814
8815Lehmer's improvement of the Euclidean algorithms is based on the observation
8816that the initial part of the quotient sequence depends only on the most
8817significant parts of the inputs. The variant of Lehmer's algorithm used in GMP
8818splits off the most significant two limbs, as suggested, e.g., in ``A
8819Double-Digit Lehmer-Euclid Algorithm'' by Jebelean (@pxref{References}). The
8820quotients of two double-limb inputs are collected as a 2 by 2 matrix with
8821single-limb elements. This is done by the function @code{mpn_hgcd2}. The
8822resulting matrix is applied to the inputs using @code{mpn_mul_1} and
8823@code{mpn_submul_1}. Each iteration usually reduces the inputs by almost one
8824limb. In the rare case of a large quotient, no progress can be made by
8825examining just the most significant two limbs, and the quotient is computed
8826using plain division.
8827
8828The resulting algorithm is asymptotically @math{O(N^2)}, just as the Euclidean
8829algorithm and the binary algorithm. The quadratic part of the work are
8830the calls to @code{mpn_mul_1} and @code{mpn_submul_1}. For small sizes, the
8831linear work is also significant. There are roughly @math{N} calls to the
8832@code{mpn_hgcd2} function. This function uses a couple of important
8833optimizations:
8834
8835@itemize
8836@item
8837It uses the same relaxed notion of correctness as @code{mpn_hgcd} (see next
8838section). This means that when called with the most significant two limbs of
8839two large numbers, the returned matrix does not always correspond exactly to
8840the initial quotient sequence for the two large numbers; the final quotient
8841may sometimes be one off.
8842
8843@item
8844It takes advantage of the fact the quotients are usually small. The division
8845operator is not used, since the corresponding assembler instruction is very
8846slow on most architectures. (This code could probably be improved further, it
8847uses many branches that are unfriendly to prediction).
8848
8849@item
8850It switches from double-limb calculations to single-limb calculations half-way
8851through, when the input numbers have been reduced in size from two limbs to
8852one and a half.
8853
8854@end itemize
8855
8856@node Subquadratic GCD, Extended GCD, Lehmer's Algorithm, Greatest Common Divisor Algorithms
8857@subsection Subquadratic GCD
8858
8859For inputs larger than @code{GCD_DC_THRESHOLD}, GCD is computed via the HGCD
8860(Half GCD) function, as a generalization to Lehmer's algorithm.
8861
8862Let the inputs @math{a,b} be of size @math{N} limbs each. Put @m{S=\lfloor N/2
8863\rfloor + 1, S = floor(N/2) + 1}. Then HGCD(a,b) returns a transformation
8864matrix @math{T} with non-negative elements, and reduced numbers @math{(c;d) =
8865T^{-1} (a;b)}. The reduced numbers @math{c,d} must be larger than @math{S}
8866limbs, while their difference @math{abs(c-d)} must fit in @math{S} limbs. The
8867matrix elements will also be of size roughly @math{N/2}.
8868
8869The HGCD base case uses Lehmer's algorithm, but with the above stop condition
8870that returns reduced numbers and the corresponding transformation matrix
8871half-way through. For inputs larger than @code{HGCD_THRESHOLD}, HGCD is
8872computed recursively, using the divide and conquer algorithm in ``On
8873Sch@"onhage's algorithm and subquadratic integer GCD computation'' by M@"oller
8874(@pxref{References}). The recursive algorithm consists of these main
8875steps.
8876
8877@itemize
8878
8879@item
8880Call HGCD recursively, on the most significant @math{N/2} limbs. Apply the
8881resulting matrix @math{T_1} to the full numbers, reducing them to a size just
8882above @math{3N/2}.
8883
8884@item
8885Perform a small number of division or subtraction steps to reduce the numbers
8886to size below @math{3N/2}. This is essential mainly for the unlikely case of
8887large quotients.
8888
8889@item
8890Call HGCD recursively, on the most significant @math{N/2} limbs of the reduced
8891numbers. Apply the resulting matrix @math{T_2} to the full numbers, reducing
8892them to a size just above @math{N/2}.
8893
8894@item
8895Compute @math{T = T_1 T_2}.
8896
8897@item
8898Perform a small number of division and subtraction steps to satisfy the
8899requirements, and return.
8900@end itemize
8901
8902GCD is then implemented as a loop around HGCD, similarly to Lehmer's
8903algorithm. Where Lehmer repeatedly chops off the top two limbs, calls
8904@code{mpn_hgcd2}, and applies the resulting matrix to the full numbers, the
8905sub-quadratic GCD chops off the most significant third of the limbs (the
8906proportion is a tuning parameter, and @math{1/3} seems to be more efficient
8907than, e.g, @math{1/2}), calls @code{mpn_hgcd}, and applies the resulting
8908matrix. Once the input numbers are reduced to size below
8909@code{GCD_DC_THRESHOLD}, Lehmer's algorithm is used for the rest of the work.
8910
8911The asymptotic running time of both HGCD and GCD is @m{O(M(N)\log N),O(M(N)*log(N))},
8912where @math{M(N)} is the time for multiplying two @math{N}-limb numbers.
8913
8914@comment node-name, next, previous, up
8915
8916@node Extended GCD, Jacobi Symbol, Subquadratic GCD, Greatest Common Divisor Algorithms
8917@subsection Extended GCD
8918
8919The extended GCD function, or GCDEXT, calculates @math{@gcd{}(a,b)} and also
8920cofactors @math{x} and @math{y} satisfying @m{ax+by=\gcd(a@C{}b),
8921a*x+b*y=gcd(a@C{}b)}. All the algorithms used for plain GCD are extended to
8922handle this case. The binary algorithm is used only for single-limb GCDEXT.
8923Lehmer's algorithm is used for sizes up to @code{GCDEXT_DC_THRESHOLD}. Above
8924this threshold, GCDEXT is implemented as a loop around HGCD, but with more
8925book-keeping to keep track of the cofactors. This gives the same asymptotic
8926running time as for GCD and HGCD, @m{O(M(N)\log N),O(M(N)*log(N))}
8927
8928One difference to plain GCD is that while the inputs @math{a} and @math{b} are
8929reduced as the algorithm proceeds, the cofactors @math{x} and @math{y} grow in
8930size. This makes the tuning of the chopping-point more difficult. The current
8931code chops off the most significant half of the inputs for the call to HGCD in
8932the first iteration, and the most significant two thirds for the remaining
8933calls. This strategy could surely be improved. Also the stop condition for the
8934loop, where Lehmer's algorithm is invoked once the inputs are reduced below
8935@code{GCDEXT_DC_THRESHOLD}, could maybe be improved by taking into account the
8936current size of the cofactors.
8937
8938@node Jacobi Symbol, , Extended GCD, Greatest Common Divisor Algorithms
8939@subsection Jacobi Symbol
8940@cindex Jacobi symbol algorithm
8941
8942@c Editor Note: I don't see other people defining the inputs, it would be nice
8943@c here because the code uses (a/b) where other references use (n/k)
8944
8945Jacobi symbol @m{\left(a \over b\right), (@var{a}/@var{b})}
8946
8947Initially if either operand fits in a single limb, a reduction is done with
8948either @code{mpn_mod_1} or @code{mpn_modexact_1_odd}, followed by the binary
8949algorithm on a single limb. The binary algorithm is well suited to a single limb,
8950and the whole calculation in this case is quite efficient.
8951
8952For inputs larger than @code{GCD_DC_THRESHOLD}, @code{mpz_jacobi},
8953@code{mpz_legendre} and @code{mpz_kronecker} are computed via the HGCD (Half
8954GCD) function, as a generalization to Lehmer's algorithm.
8955
8956Most GCD algorithms reduce @math{a} and @math{b} by repeatatily computing the
8957quotient @m{q = \lfloor a/b \rfloor, q = floor(a/b)} and iteratively replacing
8958
8959@c Couldn't figure out macros with commas.
8960@tex
8961$$ a, b = b, a - q * b$$
8962@end tex
8963@ifnottex
8964@math{a, b = b, a - q * b}
8965@end ifnottex
8966
8967Different algorithms use different methods for calculating q, but the core
8968algorithm is the same if we use @ref{Lehmer's Algorithm} or
8969@ref{Subquadratic GCD, HGCD}.
8970
8971At each step it is possible to compute if the reduction inverts the Jacobi
8972symbol based on the two least significant bits of @var{a} and @var{b}. For
8973more details see ``Efficient computation of the Jacobi symbol'' by
8974M@"oller (@pxref{References}).
8975
8976A small set of bits is thus used to track state
8977@itemize
8978@item
8979current sign of result (1 bit)
8980
8981@item
8982two least significant bits of @var{a} and @var{b} (4 bits)
8983
8984@item
8985a pointer to which input is currently the denominator (1 bit)
8986@end itemize
8987
8988In all the routines sign changes for the result are accumulated using fast bit
8989twiddling which avoids conditional jumps.
8990
8991The final result is calculated after verifying the inputs are coprime (GCD = 1)
8992by raising @m{(-1)^e,(-1)^e}
8993
8994Much of the HGCD code is shared directly with the HGCD implementations, such
8995as the 2x2 matrix calculation, @xref{Lehmer's Algorithm} basecase and
8996@code{GCD_DC_THRESHOLD}.
8997
8998The asymptotic running time is @m{O(M(N)\log N),O(M(N)*log(N))}, where
8999@math{M(N)} is the time for multiplying two @math{N}-limb numbers.
9000
9001@need 1000
9002@node Powering Algorithms, Root Extraction Algorithms, Greatest Common Divisor Algorithms, Algorithms
9003@section Powering Algorithms
9004@cindex Powering algorithms
9005
9006@menu
9007* Normal Powering Algorithm::
9008* Modular Powering Algorithm::
9009@end menu
9010
9011
9012@node Normal Powering Algorithm, Modular Powering Algorithm, Powering Algorithms, Powering Algorithms
9013@subsection Normal Powering
9014
9015Normal @code{mpz} or @code{mpf} powering uses a simple binary algorithm,
9016successively squaring and then multiplying by the base when a 1 bit is seen in
9017the exponent, as per Knuth section 4.6.3. The ``left to right''
9018variant described there is used rather than algorithm A, since it's just as
9019easy and can be done with somewhat less temporary memory.
9020
9021
9022@node Modular Powering Algorithm, , Normal Powering Algorithm, Powering Algorithms
9023@subsection Modular Powering
9024
9025Modular powering is implemented using a @math{2^k}-ary sliding window
9026algorithm, as per ``Handbook of Applied Cryptography'' algorithm 14.85
9027(@pxref{References}). @math{k} is chosen according to the size of the
9028exponent. Larger exponents use larger values of @math{k}, the choice being
9029made to minimize the average number of multiplications that must supplement
9030the squaring.
9031
9032The modular multiplies and squarings use either a simple division or the REDC
9033method by Montgomery (@pxref{References}). REDC is a little faster,
9034essentially saving N single limb divisions in a fashion similar to an exact
9035remainder (@pxref{Exact Remainder}).
9036
9037
9038@node Root Extraction Algorithms, Radix Conversion Algorithms, Powering Algorithms, Algorithms
9039@section Root Extraction Algorithms
9040@cindex Root extraction algorithms
9041
9042@menu
9043* Square Root Algorithm::
9044* Nth Root Algorithm::
9045* Perfect Square Algorithm::
9046* Perfect Power Algorithm::
9047@end menu
9048
9049
9050@node Square Root Algorithm, Nth Root Algorithm, Root Extraction Algorithms, Root Extraction Algorithms
9051@subsection Square Root
9052@cindex Square root algorithm
9053@cindex Karatsuba square root algorithm
9054
9055Square roots are taken using the ``Karatsuba Square Root'' algorithm by Paul
9056Zimmermann (@pxref{References}).
9057
9058An input @math{n} is split into four parts of @math{k} bits each, so with
9059@math{b=2^k} we have @m{n = a_3b^3 + a_2b^2 + a_1b + a_0, n = a3*b^3 + a2*b^2
9060+ a1*b + a0}. Part @ms{a,3} must be ``normalized'' so that either the high or
9061second highest bit is set. In GMP, @math{k} is kept on a limb boundary and
9062the input is left shifted (by an even number of bits) to normalize.
9063
9064The square root of the high two parts is taken, by recursive application of
9065the algorithm (bottoming out in a one-limb Newton's method),
9066@tex
9067$$ s',r' = \mathop{\rm sqrtrem} \> (a_3b + a_2) $$
9068@end tex
9069@ifnottex
9070
9071@example
9072s1,r1 = sqrtrem (a3*b + a2)
9073@end example
9074
9075@end ifnottex
9076This is an approximation to the desired root and is extended by a division to
9077give @math{s},@math{r},
9078@tex
9079$$\eqalign{
9080q,u &= \mathop{\rm divrem} \> (r'b + a_1, 2s') \cr
9081s &= s'b + q \cr
9082r &= ub + a_0 - q^2
9083}$$
9084@end tex
9085@ifnottex
9086
9087@example
9088q,u = divrem (r1*b + a1, 2*s1)
9089s = s1*b + q
9090r = u*b + a0 - q^2
9091@end example
9092
9093@end ifnottex
9094The normalization requirement on @ms{a,3} means at this point @math{s} is
9095either correct or 1 too big. @math{r} is negative in the latter case, so
9096@tex
9097$$\eqalign{
9098\mathop{\rm if} \; r &< 0 \; \mathop{\rm then} \cr
9099r &\leftarrow r + 2s - 1 \cr
9100s &\leftarrow s - 1
9101}$$
9102@end tex
9103@ifnottex
9104
9105@example
9106if r < 0 then
9107 r = r + 2*s - 1
9108 s = s - 1
9109@end example
9110
9111@end ifnottex
9112The algorithm is expressed in a divide and conquer form, but as noted in the
9113paper it can also be viewed as a discrete variant of Newton's method, or as a
9114variation on the schoolboy method (no longer taught) for square roots two
9115digits at a time.
9116
9117If the remainder @math{r} is not required then usually only a few high limbs
9118of @math{r} and @math{u} need to be calculated to determine whether an
9119adjustment to @math{s} is required. This optimization is not currently
9120implemented.
9121
9122In the Karatsuba multiplication range this algorithm is @m{O({3\over2}
9123M(N/2)),O(1.5*M(N/2))}, where @math{M(n)} is the time to multiply two numbers
9124of @math{n} limbs. In the FFT multiplication range this grows to a bound of
9125@m{O(6 M(N/2)),O(6*M(N/2))}. In practice a factor of about 1.5 to 1.8 is
9126found in the Karatsuba and Toom-3 ranges, growing to 2 or 3 in the FFT range.
9127
9128The algorithm does all its calculations in integers and the resulting
9129@code{mpn_sqrtrem} is used for both @code{mpz_sqrt} and @code{mpf_sqrt}.
9130The extended precision given by @code{mpf_sqrt_ui} is obtained by
9131padding with zero limbs.
9132
9133
9134@node Nth Root Algorithm, Perfect Square Algorithm, Square Root Algorithm, Root Extraction Algorithms
9135@subsection Nth Root
9136@cindex Root extraction algorithm
9137@cindex Nth root algorithm
9138
9139Integer Nth roots are taken using Newton's method with the following
9140iteration, where @math{A} is the input and @math{n} is the root to be taken.
9141@tex
9142$$a_{i+1} = {1\over n} \left({A \over a_i^{n-1}} + (n-1)a_i \right)$$
9143@end tex
9144@ifnottex
9145
9146@example
9147 1 A
9148a[i+1] = - * ( --------- + (n-1)*a[i] )
9149 n a[i]^(n-1)
9150@end example
9151
9152@end ifnottex
9153The initial approximation @m{a_1,a[1]} is generated bitwise by successively
9154powering a trial root with or without new 1 bits, aiming to be just above the
9155true root. The iteration converges quadratically when started from a good
9156approximation. When @math{n} is large more initial bits are needed to get
9157good convergence. The current implementation is not particularly well
9158optimized.
9159
9160
9161@node Perfect Square Algorithm, Perfect Power Algorithm, Nth Root Algorithm, Root Extraction Algorithms
9162@subsection Perfect Square
9163@cindex Perfect square algorithm
9164
9165A significant fraction of non-squares can be quickly identified by checking
9166whether the input is a quadratic residue modulo small integers.
9167
9168@code{mpz_perfect_square_p} first tests the input mod 256, which means just
9169examining the low byte. Only 44 different values occur for squares mod 256,
9170so 82.8% of inputs can be immediately identified as non-squares.
9171
9172On a 32-bit system similar tests are done mod 9, 5, 7, 13 and 17, for a total
917399.25% of inputs identified as non-squares. On a 64-bit system 97 is tested
9174too, for a total 99.62%.
9175
9176These moduli are chosen because they're factors of @math{2^@W{24}-1} (or
9177@math{2^@W{48}-1} for 64-bits), and such a remainder can be quickly taken just
9178using additions (see @code{mpn_mod_34lsub1}).
9179
9180When nails are in use moduli are instead selected by the @file{gen-psqr.c}
9181program and applied with an @code{mpn_mod_1}. The same @math{2^@W{24}-1} or
9182@math{2^@W{48}-1} could be done with nails using some extra bit shifts, but
9183this is not currently implemented.
9184
9185In any case each modulus is applied to the @code{mpn_mod_34lsub1} or
9186@code{mpn_mod_1} remainder and a table lookup identifies non-squares. By
9187using a ``modexact'' style calculation, and suitably permuted tables, just one
9188multiply each is required, see the code for details. Moduli are also combined
9189to save operations, so long as the lookup tables don't become too big.
9190@file{gen-psqr.c} does all the pre-calculations.
9191
9192A square root must still be taken for any value that passes these tests, to
9193verify it's really a square and not one of the small fraction of non-squares
9194that get through (i.e.@: a pseudo-square to all the tested bases).
9195
9196Clearly more residue tests could be done, @code{mpz_perfect_square_p} only
9197uses a compact and efficient set. Big inputs would probably benefit from more
9198residue testing, small inputs might be better off with less. The assumed
9199distribution of squares versus non-squares in the input would affect such
9200considerations.
9201
9202
9203@node Perfect Power Algorithm, , Perfect Square Algorithm, Root Extraction Algorithms
9204@subsection Perfect Power
9205@cindex Perfect power algorithm
9206
9207Detecting perfect powers is required by some factorization algorithms.
9208Currently @code{mpz_perfect_power_p} is implemented using repeated Nth root
9209extractions, though naturally only prime roots need to be considered.
9210(@xref{Nth Root Algorithm}.)
9211
9212If a prime divisor @math{p} with multiplicity @math{e} can be found, then only
9213roots which are divisors of @math{e} need to be considered, much reducing the
9214work necessary. To this end divisibility by a set of small primes is checked.
9215
9216
9217@node Radix Conversion Algorithms, Other Algorithms, Root Extraction Algorithms, Algorithms
9218@section Radix Conversion
9219@cindex Radix conversion algorithms
9220
9221Radix conversions are less important than other algorithms. A program
9222dominated by conversions should probably use a different data representation.
9223
9224@menu
9225* Binary to Radix::
9226* Radix to Binary::
9227@end menu
9228
9229
9230@node Binary to Radix, Radix to Binary, Radix Conversion Algorithms, Radix Conversion Algorithms
9231@subsection Binary to Radix
9232
9233Conversions from binary to a power-of-2 radix use a simple and fast
9234@math{O(N)} bit extraction algorithm.
9235
9236Conversions from binary to other radices use one of two algorithms. Sizes
9237below @code{GET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method.
9238Repeated divisions by @math{b^n} are made, where @math{b} is the radix and
9239@math{n} is the biggest power that fits in a limb. But instead of simply
9240using the remainder @math{r} from such divisions, an extra divide step is done
9241to give a fractional limb representing @math{r/b^n}. The digits of @math{r}
9242can then be extracted using multiplications by @math{b} rather than divisions.
9243Special case code is provided for decimal, allowing multiplications by 10 to
9244optimize to shifts and adds.
9245
9246Above @code{GET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used.
9247For an input @math{t}, powers @m{b^{n2^i},b^(n*2^i)} of the radix are
9248calculated, until a power between @math{t} and @m{\sqrt{t},sqrt(t)} is
9249reached. @math{t} is then divided by that largest power, giving a quotient
9250which is the digits above that power, and a remainder which is those below.
9251These two parts are in turn divided by the second highest power, and so on
9252recursively. When a piece has been divided down to less than
9253@code{GET_STR_DC_THRESHOLD} limbs, the basecase algorithm described above is
9254used.
9255
9256The advantage of this algorithm is that big divisions can make use of the
9257sub-quadratic divide and conquer division (@pxref{Divide and Conquer
9258Division}), and big divisions tend to have less overheads than lots of
9259separate single limb divisions anyway. But in any case the cost of
9260calculating the powers @m{b^{n2^i},b^(n*2^i)} must first be overcome.
9261
9262@code{GET_STR_PRECOMPUTE_THRESHOLD} and @code{GET_STR_DC_THRESHOLD} represent
9263the same basic thing, the point where it becomes worth doing a big division to
9264cut the input in half. @code{GET_STR_PRECOMPUTE_THRESHOLD} includes the cost
9265of calculating the radix power required, whereas @code{GET_STR_DC_THRESHOLD}
9266assumes that's already available, which is the case when recursing.
9267
9268Since the base case produces digits from least to most significant but they
9269want to be stored from most to least, it's necessary to calculate in advance
9270how many digits there will be, or at least be sure not to underestimate that.
9271For GMP the number of input bits is multiplied by @code{chars_per_bit_exactly}
9272from @code{mp_bases}, rounding up. The result is either correct or one too
9273big.
9274
9275Examining some of the high bits of the input could increase the chance of
9276getting the exact number of digits, but an exact result every time would not
9277be practical, since in general the difference between numbers 100@dots{} and
927899@dots{} is only in the last few bits and the work to identify 99@dots{}
9279might well be almost as much as a full conversion.
9280
9281The @math{r/b^n} scheme described above for using multiplications to bring out
9282digits might be useful for more than a single limb. Some brief experiments
9283with it on the base case when recursing didn't give a noticeable improvement,
9284but perhaps that was only due to the implementation. Something similar would
9285work for the sub-quadratic divisions too, though there would be the cost of
9286calculating a bigger radix power.
9287
9288Another possible improvement for the sub-quadratic part would be to arrange
9289for radix powers that balanced the sizes of quotient and remainder produced,
9290i.e.@: the highest power would be an @m{b^{nk},b^(n*k)} approximately equal to
9291@m{\sqrt{t},sqrt(t)}, not restricted to a @math{2^i} factor. That ought to
9292smooth out a graph of times against sizes, but may or may not be a net
9293speedup.
9294
9295
9296@node Radix to Binary, , Binary to Radix, Radix Conversion Algorithms
9297@subsection Radix to Binary
9298
9299@strong{This section needs to be rewritten, it currently describes the
9300algorithms used before GMP 4.3.}
9301
9302Conversions from a power-of-2 radix into binary use a simple and fast
9303@math{O(N)} bitwise concatenation algorithm.
9304
9305Conversions from other radices use one of two algorithms. Sizes below
9306@code{SET_STR_PRECOMPUTE_THRESHOLD} use a basic @math{O(N^2)} method. Groups
9307of @math{n} digits are converted to limbs, where @math{n} is the biggest
9308power of the base @math{b} which will fit in a limb, then those groups are
9309accumulated into the result by multiplying by @math{b^n} and adding. This
9310saves multi-precision operations, as per Knuth section 4.4 part E
9311(@pxref{References}). Some special case code is provided for decimal, giving
9312the compiler a chance to optimize multiplications by 10.
9313
9314Above @code{SET_STR_PRECOMPUTE_THRESHOLD} a sub-quadratic algorithm is used.
9315First groups of @math{n} digits are converted into limbs. Then adjacent
9316limbs are combined into limb pairs with @m{xb^n+y,x*b^n+y}, where @math{x}
9317and @math{y} are the limbs. Adjacent limb pairs are combined into quads
9318similarly with @m{xb^{2n}+y,x*b^(2n)+y}. This continues until a single block
9319remains, that being the result.
9320
9321The advantage of this method is that the multiplications for each @math{x} are
9322big blocks, allowing Karatsuba and higher algorithms to be used. But the cost
9323of calculating the powers @m{b^{n2^i},b^(n*2^i)} must be overcome.
9324@code{SET_STR_PRECOMPUTE_THRESHOLD} usually ends up quite big, around 5000 digits, and on
9325some processors much bigger still.
9326
9327@code{SET_STR_PRECOMPUTE_THRESHOLD} is based on the input digits (and tuned
9328for decimal), though it might be better based on a limb count, so as to be
9329independent of the base. But that sort of count isn't used by the base case
9330and so would need some sort of initial calculation or estimate.
9331
9332The main reason @code{SET_STR_PRECOMPUTE_THRESHOLD} is so much bigger than the
9333corresponding @code{GET_STR_PRECOMPUTE_THRESHOLD} is that @code{mpn_mul_1} is
9334much faster than @code{mpn_divrem_1} (often by a factor of 5, or more).
9335
9336
9337@need 1000
9338@node Other Algorithms, Assembly Coding, Radix Conversion Algorithms, Algorithms
9339@section Other Algorithms
9340
9341@menu
9342* Prime Testing Algorithm::
9343* Factorial Algorithm::
9344* Binomial Coefficients Algorithm::
9345* Fibonacci Numbers Algorithm::
9346* Lucas Numbers Algorithm::
9347* Random Number Algorithms::
9348@end menu
9349
9350
9351@node Prime Testing Algorithm, Factorial Algorithm, Other Algorithms, Other Algorithms
9352@subsection Prime Testing
9353@cindex Prime testing algorithms
9354
9355The primality testing in @code{mpz_probab_prime_p} (@pxref{Number Theoretic
9356Functions}) first does some trial division by small factors and then uses the
9357Miller-Rabin probabilistic primality testing algorithm, as described in Knuth
9358section 4.5.4 algorithm P (@pxref{References}).
9359
9360For an odd input @math{n}, and with @math{n = q@GMPmultiply{}2^k+1} where
9361@math{q} is odd, this algorithm selects a random base @math{x} and tests
9362whether @math{x^q @bmod{} n} is 1 or @math{-1}, or an @m{x^{q2^j} \bmod n,
9363x^(q*2^j) mod n} is @math{1}, for @math{1@le{}j@le{}k}. If so then @math{n}
9364is probably prime, if not then @math{n} is definitely composite.
9365
9366Any prime @math{n} will pass the test, but some composites do too. Such
9367composites are known as strong pseudoprimes to base @math{x}. No @math{n} is
9368a strong pseudoprime to more than @math{1/4} of all bases (see Knuth exercise
936922), hence with @math{x} chosen at random there's no more than a @math{1/4}
9370chance a ``probable prime'' will in fact be composite.
9371
9372In fact strong pseudoprimes are quite rare, making the test much more
9373powerful than this analysis would suggest, but @math{1/4} is all that's proven
9374for an arbitrary @math{n}.
9375
9376
9377@node Factorial Algorithm, Binomial Coefficients Algorithm, Prime Testing Algorithm, Other Algorithms
9378@subsection Factorial
9379@cindex Factorial algorithm
9380
9381Factorials are calculated by a combination of two algorithms. An idea is
9382shared among them: to compute the odd part of the factorial; a final step
9383takes account of the power of @math{2} term, by shifting.
9384
9385For small @math{n}, the odd factor of @math{n!} is computed with the simple
9386observation that it is equal to the product of all positive odd numbers
9387smaller than @math{n} times the odd factor of @m{\lfloor n/2\rfloor!, [n/2]!},
9388where @m{\lfloor x\rfloor, [x]} is the integer part of @math{x}, and so on
9389recursively. The procedure can be best illustrated with an example,
9390
9391@quotation
9392@math{23! = (23.21.19.17.15.13.11.9.7.5.3)(11.9.7.5.3)(5.3)2^{19}}
9393@end quotation
9394
9395Current code collects all the factors in a single list, with a loop and no
9396recursion, and compute the product, with no special care for repeated chunks.
9397
9398When @math{n} is larger, computation pass trough prime sieving. An helper
9399function is used, as suggested by Peter Luschny:
9400@tex
9401$$\mathop{\rm msf}(n) = {n!\over\lfloor n/2\rfloor!^2\cdot2^k} = \prod_{p=3}^{n}
9402p^{\mathop{\rm L}(p,n)} $$
9403@end tex
9404@ifnottex
9405
9406@example
9407 n
9408 -----
9409 n! | | L(p,n)
9410msf(n) = -------------- = | | p
9411 [n/2]!^2.2^k p=3
9412@end example
9413@end ifnottex
9414
9415Where @math{p} ranges on odd prime numbers. The exponent @math{k} is chosen to
9416obtain an odd integer number: @math{k} is the number of 1 bits in the binary
9417representation of @m{\lfloor n/2\rfloor, [n/2]}. The function L@math{(p,n)}
9418can be defined as zero when @math{p} is composite, and, for any prime
9419@math{p}, it is computed with:
9420@tex
9421$$\mathop{\rm L}(p,n) = \sum_{i>0}\left\lfloor{n\over p^i}\right\rfloor\bmod2
9422\leq\log_p(n)$$
9423@end tex
9424@ifnottex
9425
9426@example
9427 ---
9428 \ n
9429L(p,n) = / [---] mod 2 <= log (n) .
9430 --- p^i p
9431 i>0
9432@end example
9433@end ifnottex
9434
9435With this helper function, we are able to compute the odd part of @math{n!}
9436using the recursion implied by @m{n!=\lfloor n/2\rfloor!^2\cdot\mathop{\rm
9437msf}(n)\cdot2^k , n!=[n/2]!^2*msf(n)*2^k}. The recursion stops using the
9438small-@math{n} algorithm on some @m{\lfloor n/2^i\rfloor, [n/2^i]}.
9439
9440Both the above algorithms use binary splitting to compute the product of many
9441small factors. At first as many products as possible are accumulated in a
9442single register, generating a list of factors that fit in a machine word. This
9443list is then split into halves, and the product is computed recursively.
9444
9445Such splitting is more efficient than repeated N@cross{}1 multiplies since it
9446forms big multiplies, allowing Karatsuba and higher algorithms to be used.
9447And even below the Karatsuba threshold a big block of work can be more
9448efficient for the basecase algorithm.
9449
9450
9451@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms
9452@subsection Binomial Coefficients
9453@cindex Binomial coefficient algorithm
9454
9455Binomial coefficients @m{\left({n}\atop{k}\right), C(n@C{}k)} are calculated
9456by first arranging @math{k @le{} n/2} using @m{\left({n}\atop{k}\right) =
9457\left({n}\atop{n-k}\right), C(n@C{}k) = C(n@C{}n-k)} if necessary, and then
9458evaluating the following product simply from @math{i=2} to @math{i=k}.
9459@tex
9460$$ \left({n}\atop{k}\right) = (n-k+1) \prod_{i=2}^{k} {{n-k+i} \over i} $$
9461@end tex
9462@ifnottex
9463
9464@example
9465 k (n-k+i)
9466C(n,k) = (n-k+1) * prod -------
9467 i=2 i
9468@end example
9469
9470@end ifnottex
9471It's easy to show that each denominator @math{i} will divide the product so
9472far, so the exact division algorithm is used (@pxref{Exact Division}).
9473
9474The numerators @math{n-k+i} and denominators @math{i} are first accumulated
9475into as many fit a limb, to save multi-precision operations, though for
9476@code{mpz_bin_ui} this applies only to the divisors, since @math{n} is an
9477@code{mpz_t} and @math{n-k+i} in general won't fit in a limb at all.
9478
9479
9480@node Fibonacci Numbers Algorithm, Lucas Numbers Algorithm, Binomial Coefficients Algorithm, Other Algorithms
9481@subsection Fibonacci Numbers
9482@cindex Fibonacci number algorithm
9483
9484The Fibonacci functions @code{mpz_fib_ui} and @code{mpz_fib2_ui} are designed
9485for calculating isolated @m{F_n,F[n]} or @m{F_n,F[n]},@m{F_{n-1},F[n-1]}
9486values efficiently.
9487
9488For small @math{n}, a table of single limb values in @code{__gmp_fib_table} is
9489used. On a 32-bit limb this goes up to @m{F_{47},F[47]}, or on a 64-bit limb
9490up to @m{F_{93},F[93]}. For convenience the table starts at @m{F_{-1},F[-1]}.
9491
9492Beyond the table, values are generated with a binary powering algorithm,
9493calculating a pair @m{F_n,F[n]} and @m{F_{n-1},F[n-1]} working from high to
9494low across the bits of @math{n}. The formulas used are
9495@tex
9496$$\eqalign{
9497 F_{2k+1} &= 4F_k^2 - F_{k-1}^2 + 2(-1)^k \cr
9498 F_{2k-1} &= F_k^2 + F_{k-1}^2 \cr
9499 F_{2k} &= F_{2k+1} - F_{2k-1}
9500}$$
9501@end tex
9502@ifnottex
9503
9504@example
9505F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k
9506F[2k-1] = F[k]^2 + F[k-1]^2
9507
9508F[2k] = F[2k+1] - F[2k-1]
9509@end example
9510
9511@end ifnottex
9512At each step, @math{k} is the high @math{b} bits of @math{n}. If the next bit
9513of @math{n} is 0 then @m{F_{2k},F[2k]},@m{F_{2k-1},F[2k-1]} is used, or if
9514it's a 1 then @m{F_{2k+1},F[2k+1]},@m{F_{2k},F[2k]} is used, and the process
9515repeated until all bits of @math{n} are incorporated. Notice these formulas
9516require just two squares per bit of @math{n}.
9517
9518It'd be possible to handle the first few @math{n} above the single limb table
9519with simple additions, using the defining Fibonacci recurrence @m{F_{k+1} =
9520F_k + F_{k-1}, F[k+1]=F[k]+F[k-1]}, but this is not done since it usually
9521turns out to be faster for only about 10 or 20 values of @math{n}, and
9522including a block of code for just those doesn't seem worthwhile. If they
9523really mattered it'd be better to extend the data table.
9524
9525Using a table avoids lots of calculations on small numbers, and makes small
9526@math{n} go fast. A bigger table would make more small @math{n} go fast, it's
9527just a question of balancing size against desired speed. For GMP the code is
9528kept compact, with the emphasis primarily on a good powering algorithm.
9529
9530@code{mpz_fib2_ui} returns both @m{F_n,F[n]} and @m{F_{n-1},F[n-1]}, but
9531@code{mpz_fib_ui} is only interested in @m{F_n,F[n]}. In this case the last
9532step of the algorithm can become one multiply instead of two squares. One of
9533the following two formulas is used, according as @math{n} is odd or even.
9534@tex
9535$$\eqalign{
9536 F_{2k} &= F_k (F_k + 2F_{k-1}) \cr
9537 F_{2k+1} &= (2F_k + F_{k-1}) (2F_k - F_{k-1}) + 2(-1)^k
9538}$$
9539@end tex
9540@ifnottex
9541
9542@example
9543F[2k] = F[k]*(F[k]+2F[k-1])
9544
9545F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k
9546@end example
9547
9548@end ifnottex
9549@m{F_{2k+1},F[2k+1]} here is the same as above, just rearranged to be a
9550multiply. For interest, the @m{2(-1)^k, 2*(-1)^k} term both here and above
9551can be applied just to the low limb of the calculation, without a carry or
9552borrow into further limbs, which saves some code size. See comments with
9553@code{mpz_fib_ui} and the internal @code{mpn_fib2_ui} for how this is done.
9554
9555
9556@node Lucas Numbers Algorithm, Random Number Algorithms, Fibonacci Numbers Algorithm, Other Algorithms
9557@subsection Lucas Numbers
9558@cindex Lucas number algorithm
9559
9560@code{mpz_lucnum2_ui} derives a pair of Lucas numbers from a pair of Fibonacci
9561numbers with the following simple formulas.
9562@tex
9563$$\eqalign{
9564 L_k &= F_k + 2F_{k-1} \cr
9565 L_{k-1} &= 2F_k - F_{k-1}
9566}$$
9567@end tex
9568@ifnottex
9569
9570@example
9571L[k] = F[k] + 2*F[k-1]
9572L[k-1] = 2*F[k] - F[k-1]
9573@end example
9574
9575@end ifnottex
9576@code{mpz_lucnum_ui} is only interested in @m{L_n,L[n]}, and some work can be
9577saved. Trailing zero bits on @math{n} can be handled with a single square
9578each.
9579@tex
9580$$ L_{2k} = L_k^2 - 2(-1)^k $$
9581@end tex
9582@ifnottex
9583
9584@example
9585L[2k] = L[k]^2 - 2*(-1)^k
9586@end example
9587
9588@end ifnottex
9589And the lowest 1 bit can be handled with one multiply of a pair of Fibonacci
9590numbers, similar to what @code{mpz_fib_ui} does.
9591@tex
9592$$ L_{2k+1} = 5F_{k-1} (2F_k + F_{k-1}) - 4(-1)^k $$
9593@end tex
9594@ifnottex
9595
9596@example
9597L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k
9598@end example
9599
9600@end ifnottex
9601
9602
9603@node Random Number Algorithms, , Lucas Numbers Algorithm, Other Algorithms
9604@subsection Random Numbers
9605@cindex Random number algorithms
9606
9607For the @code{urandomb} functions, random numbers are generated simply by
9608concatenating bits produced by the generator. As long as the generator has
9609good randomness properties this will produce well-distributed @math{N} bit
9610numbers.
9611
9612For the @code{urandomm} functions, random numbers in a range @math{0@le{}R<N}
9613are generated by taking values @math{R} of @m{\lceil \log_2 N \rceil,
9614ceil(log2(N))} bits each until one satisfies @math{R<N}. This will normally
9615require only one or two attempts, but the attempts are limited in case the
9616generator is somehow degenerate and produces only 1 bits or similar.
9617
9618@cindex Mersenne twister algorithm
9619The Mersenne Twister generator is by Matsumoto and Nishimura
9620(@pxref{References}). It has a non-repeating period of @math{2^@W{19937}-1},
9621which is a Mersenne prime, hence the name of the generator. The state is 624
9622words of 32-bits each, which is iterated with one XOR and shift for each
962332-bit word generated, making the algorithm very fast. Randomness properties
9624are also very good and this is the default algorithm used by GMP.
9625
9626@cindex Linear congruential algorithm
9627Linear congruential generators are described in many text books, for instance
9628Knuth volume 2 (@pxref{References}). With a modulus @math{M} and parameters
9629@math{A} and @math{C}, an integer state @math{S} is iterated by the formula
9630@math{S @leftarrow{} A@GMPmultiply{}S+C @bmod{} M}. At each step the new
9631state is a linear function of the previous, mod @math{M}, hence the name of
9632the generator.
9633
9634In GMP only moduli of the form @math{2^N} are supported, and the current
9635implementation is not as well optimized as it could be. Overheads are
9636significant when @math{N} is small, and when @math{N} is large clearly the
9637multiply at each step will become slow. This is not a big concern, since the
9638Mersenne Twister generator is better in every respect and is therefore
9639recommended for all normal applications.
9640
9641For both generators the current state can be deduced by observing enough
9642output and applying some linear algebra (over GF(2) in the case of the
9643Mersenne Twister). This generally means raw output is unsuitable for
9644cryptographic applications without further hashing or the like.
9645
9646
9647@node Assembly Coding, , Other Algorithms, Algorithms
9648@section Assembly Coding
9649@cindex Assembly coding
9650
9651The assembly subroutines in GMP are the most significant source of speed at
9652small to moderate sizes. At larger sizes algorithm selection becomes more
9653important, but of course speedups in low level routines will still speed up
9654everything proportionally.
9655
9656Carry handling and widening multiplies that are important for GMP can't be
9657easily expressed in C@. GCC @code{asm} blocks help a lot and are provided in
9658@file{longlong.h}, but hand coding low level routines invariably offers a
9659speedup over generic C by a factor of anything from 2 to 10.
9660
9661@menu
9662* Assembly Code Organisation::
9663* Assembly Basics::
9664* Assembly Carry Propagation::
9665* Assembly Cache Handling::
9666* Assembly Functional Units::
9667* Assembly Floating Point::
9668* Assembly SIMD Instructions::
9669* Assembly Software Pipelining::
9670* Assembly Loop Unrolling::
9671* Assembly Writing Guide::
9672@end menu
9673
9674
9675@node Assembly Code Organisation, Assembly Basics, Assembly Coding, Assembly Coding
9676@subsection Code Organisation
9677@cindex Assembly code organisation
9678@cindex Code organisation
9679
9680The various @file{mpn} subdirectories contain machine-dependent code, written
9681in C or assembly. The @file{mpn/generic} subdirectory contains default code,
9682used when there's no machine-specific version of a particular file.
9683
9684Each @file{mpn} subdirectory is for an ISA family. Generally 32-bit and
968564-bit variants in a family cannot share code and have separate directories.
9686Within a family further subdirectories may exist for CPU variants.
9687
9688In each directory a @file{nails} subdirectory may exist, holding code with
9689nails support for that CPU variant. A @code{NAILS_SUPPORT} directive in each
9690file indicates the nails values the code handles. Nails code only exists
9691where it's faster, or promises to be faster, than plain code. There's no
9692effort put into nails if they're not going to enhance a given CPU.
9693
9694
9695@node Assembly Basics, Assembly Carry Propagation, Assembly Code Organisation, Assembly Coding
9696@subsection Assembly Basics
9697
9698@code{mpn_addmul_1} and @code{mpn_submul_1} are the most important routines
9699for overall GMP performance. All multiplications and divisions come down to
9700repeated calls to these. @code{mpn_add_n}, @code{mpn_sub_n},
9701@code{mpn_lshift} and @code{mpn_rshift} are next most important.
9702
9703On some CPUs assembly versions of the internal functions
9704@code{mpn_mul_basecase} and @code{mpn_sqr_basecase} give significant speedups,
9705mainly through avoiding function call overheads. They can also potentially
9706make better use of a wide superscalar processor, as can bigger primitives like
9707@code{mpn_addmul_2} or @code{mpn_addmul_4}.
9708
9709The restrictions on overlaps between sources and destinations
9710(@pxref{Low-level Functions}) are designed to facilitate a variety of
9711implementations. For example, knowing @code{mpn_add_n} won't have partly
9712overlapping sources and destination means reading can be done far ahead of
9713writing on superscalar processors, and loops can be vectorized on a vector
9714processor, depending on the carry handling.
9715
9716
9717@node Assembly Carry Propagation, Assembly Cache Handling, Assembly Basics, Assembly Coding
9718@subsection Carry Propagation
9719@cindex Assembly carry propagation
9720
9721The problem that presents most challenges in GMP is propagating carries from
9722one limb to the next. In functions like @code{mpn_addmul_1} and
9723@code{mpn_add_n}, carries are the only dependencies between limb operations.
9724
9725On processors with carry flags, a straightforward CISC style @code{adc} is
9726generally best. AMD K6 @code{mpn_addmul_1} however is an example of an
9727unusual set of circumstances where a branch works out better.
9728
9729On RISC processors generally an add and compare for overflow is used. This
9730sort of thing can be seen in @file{mpn/generic/aors_n.c}. Some carry
9731propagation schemes require 4 instructions, meaning at least 4 cycles per
9732limb, but other schemes may use just 1 or 2. On wide superscalar processors
9733performance may be completely determined by the number of dependent
9734instructions between carry-in and carry-out for each limb.
9735
9736On vector processors good use can be made of the fact that a carry bit only
9737very rarely propagates more than one limb. When adding a single bit to a
9738limb, there's only a carry out if that limb was @code{0xFF@dots{}FF} which on
9739random data will be only 1 in @m{2\GMPraise{@code{mp\_bits\_per\_limb}},
97402^mp_bits_per_limb}. @file{mpn/cray/add_n.c} is an example of this, it adds
9741all limbs in parallel, adds one set of carry bits in parallel and then only
9742rarely needs to fall through to a loop propagating further carries.
9743
9744On the x86s, GCC (as of version 2.95.2) doesn't generate particularly good code
9745for the RISC style idioms that are necessary to handle carry bits in
9746C@. Often conditional jumps are generated where @code{adc} or @code{sbb} forms
9747would be better. And so unfortunately almost any loop involving carry bits
9748needs to be coded in assembly for best results.
9749
9750
9751@node Assembly Cache Handling, Assembly Functional Units, Assembly Carry Propagation, Assembly Coding
9752@subsection Cache Handling
9753@cindex Assembly cache handling
9754
9755GMP aims to perform well both on operands that fit entirely in L1 cache and
9756those which don't.
9757
9758Basic routines like @code{mpn_add_n} or @code{mpn_lshift} are often used on
9759large operands, so L2 and main memory performance is important for them.
9760@code{mpn_mul_1} and @code{mpn_addmul_1} are mostly used for multiply and
9761square basecases, so L1 performance matters most for them, unless assembly
9762versions of @code{mpn_mul_basecase} and @code{mpn_sqr_basecase} exist, in
9763which case the remaining uses are mostly for larger operands.
9764
9765For L2 or main memory operands, memory access times will almost certainly be
9766more than the calculation time. The aim therefore is to maximize memory
9767throughput, by starting a load of the next cache line while processing the
9768contents of the previous one. Clearly this is only possible if the chip has a
9769lock-up free cache or some sort of prefetch instruction. Most current chips
9770have both these features.
9771
9772Prefetching sources combines well with loop unrolling, since a prefetch can be
9773initiated once per unrolled loop (or more than once if the loop covers more
9774than one cache line).
9775
9776On CPUs without write-allocate caches, prefetching destinations will ensure
9777individual stores don't go further down the cache hierarchy, limiting
9778bandwidth. Of course for calculations which are slow anyway, like
9779@code{mpn_divrem_1}, write-throughs might be fine.
9780
9781The distance ahead to prefetch will be determined by memory latency versus
9782throughput. The aim of course is to have data arriving continuously, at peak
9783throughput. Some CPUs have limits on the number of fetches or prefetches in
9784progress.
9785
9786If a special prefetch instruction doesn't exist then a plain load can be used,
9787but in that case care must be taken not to attempt to read past the end of an
9788operand, since that might produce a segmentation violation.
9789
9790Some CPUs or systems have hardware that detects sequential memory accesses and
9791initiates suitable cache movements automatically, making life easy.
9792
9793
9794@node Assembly Functional Units, Assembly Floating Point, Assembly Cache Handling, Assembly Coding
9795@subsection Functional Units
9796
9797When choosing an approach for an assembly loop, consideration is given to
9798what operations can execute simultaneously and what throughput can thereby be
9799achieved. In some cases an algorithm can be tweaked to accommodate available
9800resources.
9801
9802Loop control will generally require a counter and pointer updates, costing as
9803much as 5 instructions, plus any delays a branch introduces. CPU addressing
9804modes might reduce pointer updates, perhaps by allowing just one updating
9805pointer and others expressed as offsets from it, or on CISC chips with all
9806addressing done with the loop counter as a scaled index.
9807
9808The final loop control cost can be amortised by processing several limbs in
9809each iteration (@pxref{Assembly Loop Unrolling}). This at least ensures loop
9810control isn't a big fraction the work done.
9811
9812Memory throughput is always a limit. If perhaps only one load or one store
9813can be done per cycle then 3 cycles/limb will the top speed for ``binary''
9814operations like @code{mpn_add_n}, and any code achieving that is optimal.
9815
9816Integer resources can be freed up by having the loop counter in a float
9817register, or by pressing the float units into use for some multiplying,
9818perhaps doing every second limb on the float side (@pxref{Assembly Floating
9819Point}).
9820
9821Float resources can be freed up by doing carry propagation on the integer
9822side, or even by doing integer to float conversions in integers using bit
9823twiddling.
9824
9825
9826@node Assembly Floating Point, Assembly SIMD Instructions, Assembly Functional Units, Assembly Coding
9827@subsection Floating Point
9828@cindex Assembly floating Point
9829
9830Floating point arithmetic is used in GMP for multiplications on CPUs with poor
9831integer multipliers. It's mostly useful for @code{mpn_mul_1},
9832@code{mpn_addmul_1} and @code{mpn_submul_1} on 64-bit machines, and
9833@code{mpn_mul_basecase} on both 32-bit and 64-bit machines.
9834
9835With IEEE 53-bit double precision floats, integer multiplications producing up
9836to 53 bits will give exact results. Breaking a 64@cross{}64 multiplication
9837into eight 16@cross{}@math{32@rightarrow{}48} bit pieces is convenient. With
9838some care though six 21@cross{}@math{32@rightarrow{}53} bit products can be
9839used, if one of the lower two 21-bit pieces also uses the sign bit.
9840
9841For the @code{mpn_mul_1} family of functions on a 64-bit machine, the
9842invariant single limb is split at the start, into 3 or 4 pieces. Inside the
9843loop, the bignum operand is split into 32-bit pieces. Fast conversion of
9844these unsigned 32-bit pieces to floating point is highly machine-dependent.
9845In some cases, reading the data into the integer unit, zero-extending to
984664-bits, then transferring to the floating point unit back via memory is the
9847only option.
9848
9849Converting partial products back to 64-bit limbs is usually best done as a
9850signed conversion. Since all values are smaller than @m{2^{53},2^53}, signed
9851and unsigned are the same, but most processors lack unsigned conversions.
9852
9853@sp 2
9854
9855Here is a diagram showing 16@cross{}32 bit products for an @code{mpn_mul_1} or
9856@code{mpn_addmul_1} with a 64-bit limb. The single limb operand V is split
9857into four 16-bit parts. The multi-limb operand U is split in the loop into
9858two 32-bit parts.
9859
9860@tex
9861\global\newdimen\GMPbits \global\GMPbits=0.18em
9862\def\GMPbox#1#2#3{%
9863 \hbox{%
9864 \hbox to 128\GMPbits{\hfil
9865 \vbox{%
9866 \hrule
9867 \hbox to 48\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
9868 \hrule}%
9869 \hskip #1\GMPbits}%
9870 \raise \GMPboxdepth \hbox{\hskip 2em #3}}}
9871%
9872\GMPdisplay{%
9873 \vbox{%
9874 \hbox{%
9875 \hbox to 128\GMPbits {\hfil
9876 \vbox{%
9877 \hrule
9878 \hbox to 64\GMPbits{%
9879 \GMPvrule \hfil$v48$\hfil
9880 \vrule \hfil$v32$\hfil
9881 \vrule \hfil$v16$\hfil
9882 \vrule \hfil$v00$\hfil
9883 \vrule}
9884 \hrule}}%
9885 \raise \GMPboxdepth \hbox{\hskip 2em V Operand}}
9886 \vskip 0.5ex
9887 \hbox{%
9888 \hbox to 128\GMPbits {\hfil
9889 \raise \GMPboxdepth \hbox{$\times$\hskip 1.5em}%
9890 \vbox{%
9891 \hrule
9892 \hbox to 64\GMPbits {%
9893 \GMPvrule \hfil$u32$\hfil
9894 \vrule \hfil$u00$\hfil
9895 \vrule}%
9896 \hrule}}%
9897 \raise \GMPboxdepth \hbox{\hskip 2em U Operand (one limb)}}%
9898 \vskip 0.5ex
9899 \hbox{\vbox to 2ex{\hrule width 128\GMPbits}}%
9900 \GMPbox{0}{u00 \times v00}{$p00$\hskip 1.5em 48-bit products}%
9901 \vskip 0.5ex
9902 \GMPbox{16}{u00 \times v16}{$p16$}
9903 \vskip 0.5ex
9904 \GMPbox{32}{u00 \times v32}{$p32$}
9905 \vskip 0.5ex
9906 \GMPbox{48}{u00 \times v48}{$p48$}
9907 \vskip 0.5ex
9908 \GMPbox{32}{u32 \times v00}{$r32$}
9909 \vskip 0.5ex
9910 \GMPbox{48}{u32 \times v16}{$r48$}
9911 \vskip 0.5ex
9912 \GMPbox{64}{u32 \times v32}{$r64$}
9913 \vskip 0.5ex
9914 \GMPbox{80}{u32 \times v48}{$r80$}
9915}}
9916@end tex
9917@ifnottex
9918@example
9919@group
9920 +---+---+---+---+
9921 |v48|v32|v16|v00| V operand
9922 +---+---+---+---+
9923
9924 +-------+---+---+
9925 x | u32 | u00 | U operand (one limb)
9926 +---------------+
9927
9928---------------------------------
9929
9930 +-----------+
9931 | u00 x v00 | p00 48-bit products
9932 +-----------+
9933 +-----------+
9934 | u00 x v16 | p16
9935 +-----------+
9936 +-----------+
9937 | u00 x v32 | p32
9938 +-----------+
9939 +-----------+
9940 | u00 x v48 | p48
9941 +-----------+
9942 +-----------+
9943 | u32 x v00 | r32
9944 +-----------+
9945 +-----------+
9946 | u32 x v16 | r48
9947 +-----------+
9948 +-----------+
9949 | u32 x v32 | r64
9950 +-----------+
9951+-----------+
9952| u32 x v48 | r80
9953+-----------+
9954@end group
9955@end example
9956@end ifnottex
9957
9958@math{p32} and @math{r32} can be summed using floating-point addition, and
9959likewise @math{p48} and @math{r48}. @math{p00} and @math{p16} can be summed
9960with @math{r64} and @math{r80} from the previous iteration.
9961
9962For each loop then, four 49-bit quantities are transferred to the integer unit,
9963aligned as follows,
9964
9965@tex
9966% GMPbox here should be 49 bits wide, but use 51 to better show p16+r80'
9967% crossing into the upper 64 bits.
9968\def\GMPbox#1#2#3{%
9969 \hbox{%
9970 \hbox to 128\GMPbits {%
9971 \hfil
9972 \vbox{%
9973 \hrule
9974 \hbox to 51\GMPbits {\GMPvrule \hfil$#2$\hfil \vrule}%
9975 \hrule}%
9976 \hskip #1\GMPbits}%
9977 \raise \GMPboxdepth \hbox{\hskip 1.5em $#3$\hfil}%
9978}}
9979\newbox\b \setbox\b\hbox{64 bits}%
9980\newdimen\bw \bw=\wd\b \advance\bw by 2em
9981\newdimen\x \x=128\GMPbits
9982\advance\x by -2\bw
9983\divide\x by4
9984\GMPdisplay{%
9985 \vbox{%
9986 \hbox to 128\GMPbits {%
9987 \GMPvrule
9988 \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
9989 \hfil 64 bits\hfil
9990 \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
9991 \vrule
9992 \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
9993 \hfil 64 bits\hfil
9994 \raise 0.5ex \vbox{\hrule \hbox to \x {}}%
9995 \vrule}%
9996 \vskip 0.7ex
9997 \GMPbox{0}{p00+r64'}{i00}
9998 \vskip 0.5ex
9999 \GMPbox{16}{p16+r80'}{i16}
10000 \vskip 0.5ex
10001 \GMPbox{32}{p32+r32}{i32}
10002 \vskip 0.5ex
10003 \GMPbox{48}{p48+r48}{i48}
10004}}
10005@end tex
10006@ifnottex
10007@example
10008@group
10009|-----64bits----|-----64bits----|
10010 +------------+
10011 | p00 + r64' | i00
10012 +------------+
10013 +------------+
10014 | p16 + r80' | i16
10015 +------------+
10016 +------------+
10017 | p32 + r32 | i32
10018 +------------+
10019 +------------+
10020 | p48 + r48 | i48
10021 +------------+
10022@end group
10023@end example
10024@end ifnottex
10025
10026The challenge then is to sum these efficiently and add in a carry limb,
10027generating a low 64-bit result limb and a high 33-bit carry limb (@math{i48}
10028extends 33 bits into the high half).
10029
10030
10031@node Assembly SIMD Instructions, Assembly Software Pipelining, Assembly Floating Point, Assembly Coding
10032@subsection SIMD Instructions
10033@cindex Assembly SIMD
10034
10035The single-instruction multiple-data support in current microprocessors is
10036aimed at signal processing algorithms where each data point can be treated
10037more or less independently. There's generally not much support for
10038propagating the sort of carries that arise in GMP.
10039
10040SIMD multiplications of say four 16@cross{}16 bit multiplies only do as much
10041work as one 32@cross{}32 from GMP's point of view, and need some shifts and
10042adds besides. But of course if say the SIMD form is fully pipelined and uses
10043less instruction decoding then it may still be worthwhile.
10044
10045On the x86 chips, MMX has so far found a use in @code{mpn_rshift} and
10046@code{mpn_lshift}, and is used in a special case for 16-bit multipliers in the
10047P55 @code{mpn_mul_1}. SSE2 is used for Pentium 4 @code{mpn_mul_1},
10048@code{mpn_addmul_1}, and @code{mpn_submul_1}.
10049
10050
10051@node Assembly Software Pipelining, Assembly Loop Unrolling, Assembly SIMD Instructions, Assembly Coding
10052@subsection Software Pipelining
10053@cindex Assembly software pipelining
10054
10055Software pipelining consists of scheduling instructions around the branch
10056point in a loop. For example a loop might issue a load not for use in the
10057present iteration but the next, thereby allowing extra cycles for the data to
10058arrive from memory.
10059
10060Naturally this is wanted only when doing things like loads or multiplies that
10061take several cycles to complete, and only where a CPU has multiple functional
10062units so that other work can be done in the meantime.
10063
10064A pipeline with several stages will have a data value in progress at each
10065stage and each loop iteration moves them along one stage. This is like
10066juggling.
10067
10068If the latency of some instruction is greater than the loop time then it will
10069be necessary to unroll, so one register has a result ready to use while
10070another (or multiple others) are still in progress. (@pxref{Assembly Loop
10071Unrolling}).
10072
10073
10074@node Assembly Loop Unrolling, Assembly Writing Guide, Assembly Software Pipelining, Assembly Coding
10075@subsection Loop Unrolling
10076@cindex Assembly loop unrolling
10077
10078Loop unrolling consists of replicating code so that several limbs are
10079processed in each loop. At a minimum this reduces loop overheads by a
10080corresponding factor, but it can also allow better register usage, for example
10081alternately using one register combination and then another. Judicious use of
10082@command{m4} macros can help avoid lots of duplication in the source code.
10083
10084Any amount of unrolling can be handled with a loop counter that's decremented
10085by @math{N} each time, stopping when the remaining count is less than the
10086further @math{N} the loop will process. Or by subtracting @math{N} at the
10087start, the termination condition becomes when the counter @math{C} is less
10088than 0 (and the count of remaining limbs is @math{C+N}).
10089
10090Alternately for a power of 2 unroll the loop count and remainder can be
10091established with a shift and mask. This is convenient if also making a
10092computed jump into the middle of a large loop.
10093
10094The limbs not a multiple of the unrolling can be handled in various ways, for
10095example
10096
10097@itemize @bullet
10098@item
10099A simple loop at the end (or the start) to process the excess. Care will be
10100wanted that it isn't too much slower than the unrolled part.
10101
10102@item
10103A set of binary tests, for example after an 8-limb unrolling, test for 4 more
10104limbs to process, then a further 2 more or not, and finally 1 more or not.
10105This will probably take more code space than a simple loop.
10106
10107@item
10108A @code{switch} statement, providing separate code for each possible excess,
10109for example an 8-limb unrolling would have separate code for 0 remaining, 1
10110remaining, etc, up to 7 remaining. This might take a lot of code, but may be
10111the best way to optimize all cases in combination with a deep pipelined loop.
10112
10113@item
10114A computed jump into the middle of the loop, thus making the first iteration
10115handle the excess. This should make times smoothly increase with size, which
10116is attractive, but setups for the jump and adjustments for pointers can be
10117tricky and could become quite difficult in combination with deep pipelining.
10118@end itemize
10119
10120
10121@node Assembly Writing Guide, , Assembly Loop Unrolling, Assembly Coding
10122@subsection Writing Guide
10123@cindex Assembly writing guide
10124
10125This is a guide to writing software pipelined loops for processing limb
10126vectors in assembly.
10127
10128First determine the algorithm and which instructions are needed. Code it
10129without unrolling or scheduling, to make sure it works. On a 3-operand CPU
10130try to write each new value to a new register, this will greatly simplify later
10131steps.
10132
10133Then note for each instruction the functional unit and/or issue port
10134requirements. If an instruction can use either of two units, like U0 or U1
10135then make a category ``U0/U1''. Count the total using each unit (or combined
10136unit), and count all instructions.
10137
10138Figure out from those counts the best possible loop time. The goal will be to
10139find a perfect schedule where instruction latencies are completely hidden.
10140The total instruction count might be the limiting factor, or perhaps a
10141particular functional unit. It might be possible to tweak the instructions to
10142help the limiting factor.
10143
10144Suppose the loop time is @math{N}, then make @math{N} issue buckets, with the
10145final loop branch at the end of the last. Now fill the buckets with dummy
10146instructions using the functional units desired. Run this to make sure the
10147intended speed is reached.
10148
10149Now replace the dummy instructions with the real instructions from the slow
10150but correct loop you started with. The first will typically be a load
10151instruction. Then the instruction using that value is placed in a bucket an
10152appropriate distance down. Run the loop again, to check it still runs at
10153target speed.
10154
10155Keep placing instructions, frequently measuring the loop. After a few you
10156will need to wrap around from the last bucket back to the top of the loop. If
10157you used the new-register for new-value strategy above then there will be no
10158register conflicts. If not then take care not to clobber something already in
10159use. Changing registers at this time is very error prone.
10160
10161The loop will overlap two or more of the original loop iterations, and the
10162computation of one vector element result will be started in one iteration of
10163the new loop, and completed one or several iterations later.
10164
10165The final step is to create feed-in and wind-down code for the loop. A good
10166way to do this is to make a copy (or copies) of the loop at the start and
10167delete those instructions which don't have valid antecedents, and at the end
10168replicate and delete those whose results are unwanted (including any further
10169loads).
10170
10171The loop will have a minimum number of limbs loaded and processed, so the
10172feed-in code must test if the request size is smaller and skip either to a
10173suitable part of the wind-down or to special code for small sizes.
10174
10175
10176@node Internals, Contributors, Algorithms, Top
10177@chapter Internals
10178@cindex Internals
10179
10180@strong{This chapter is provided only for informational purposes and the
10181various internals described here may change in future GMP releases.
10182Applications expecting to be compatible with future releases should use only
10183the documented interfaces described in previous chapters.}
10184
10185@menu
10186* Integer Internals::
10187* Rational Internals::
10188* Float Internals::
10189* Raw Output Internals::
10190* C++ Interface Internals::
10191@end menu
10192
10193@node Integer Internals, Rational Internals, Internals, Internals
10194@section Integer Internals
10195@cindex Integer internals
10196
10197@code{mpz_t} variables represent integers using sign and magnitude, in space
10198dynamically allocated and reallocated. The fields are as follows.
10199
10200@table @asis
10201@item @code{_mp_size}
10202The number of limbs, or the negative of that when representing a negative
10203integer. Zero is represented by @code{_mp_size} set to zero, in which case
10204the @code{_mp_d} data is undefined.
10205
10206@item @code{_mp_d}
10207A pointer to an array of limbs which is the magnitude. These are stored
10208``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the
10209least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most
10210significant. Whenever @code{_mp_size} is non-zero, the most significant limb
10211is non-zero.
10212
10213Currently there's always at least one readable limb, so for instance
10214@code{mpz_get_ui} can fetch @code{_mp_d[0]} unconditionally (though its value
10215is undefined if @code{_mp_size} is zero).
10216
10217@item @code{_mp_alloc}
10218@code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d},
10219and normally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine
10220is about to (or might be about to) increase @code{_mp_size}, it checks
10221@code{_mp_alloc} to see whether there's enough space, and reallocates if not.
10222@code{MPZ_REALLOC} is generally used for this.
10223
10224@code{mpz_t} variables initialised with the @code{mpz_roinit_n} function or
10225the @code{MPZ_ROINIT_N} macro have @code{_mp_alloc = 0} but can have a
10226non-zero @code{_mp_size}. They can only be used as read-only constants. See
10227@ref{Integer Special Functions} for details.
10228@end table
10229
10230The various bitwise logical functions like @code{mpz_and} behave as if
10231negative values were twos complement. But sign and magnitude is always used
10232internally, and necessary adjustments are made during the calculations.
10233Sometimes this isn't pretty, but sign and magnitude are best for other
10234routines.
10235
10236Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these
10237have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory
10238allocation functions. Care is taken to ensure that these are big enough that
10239no reallocation is necessary (since it would have unpredictable consequences).
10240
10241@code{_mp_size} and @code{_mp_alloc} are @code{int}, although @code{mp_size_t}
10242is usually a @code{long}. This is done to make the fields just 32 bits on
10243some 64 bits systems, thereby saving a few bytes of data space but still
10244providing plenty of range.
10245
10246
10247@node Rational Internals, Float Internals, Integer Internals, Internals
10248@section Rational Internals
10249@cindex Rational internals
10250
10251@code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and
10252denominator (@pxref{Integer Internals}).
10253
10254The canonical form adopted is denominator positive (and non-zero), no common
10255factors between numerator and denominator, and zero uniquely represented as
102560/1.
10257
10258It's believed that casting out common factors at each stage of a calculation
10259is best in general. A GCD is an @math{O(N^2)} operation so it's better to do
10260a few small ones immediately than to delay and have to do a big one later.
10261Knowing the numerator and denominator have no common factors can be used for
10262example in @code{mpq_mul} to make only two cross GCDs necessary, not four.
10263
10264This general approach to common factors is badly sub-optimal in the presence
10265of simple factorizations or little prospect for cancellation, but GMP has no
10266way to know when this will occur. As per @ref{Efficiency}, that's left to
10267applications. The @code{mpq_t} framework might still suit, with
10268@code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and
10269denominator, or of course @code{mpz_t} variables can be used directly.
10270
10271
10272@node Float Internals, Raw Output Internals, Rational Internals, Internals
10273@section Float Internals
10274@cindex Float internals
10275
10276Efficient calculation is the primary aim of GMP floats and the use of whole
10277limbs and simple rounding facilitates this.
10278
10279@code{mpf_t} floats have a variable precision mantissa and a single machine
10280word signed exponent. The mantissa is represented using sign and magnitude.
10281
10282@c FIXME: The arrow heads don't join to the lines exactly.
10283@tex
10284\global\newdimen\GMPboxwidth \GMPboxwidth=5em
10285\global\newdimen\GMPboxheight \GMPboxheight=3ex
10286\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
10287\GMPdisplay{%
10288\vbox{%
10289 \hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb}
10290 \vskip 0.7ex
10291 \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
10292 \hbox {
10293 \hbox to 3\GMPboxwidth {%
10294 \setbox 0 = \hbox{@code{\_mp\_exp}}%
10295 \dimen0=3\GMPboxwidth
10296 \advance\dimen0 by -\wd0
10297 \divide\dimen0 by 2
10298 \advance\dimen0 by -1em
10299 \setbox1 = \hbox{$\rightarrow$}%
10300 \dimen1=\dimen0
10301 \advance\dimen1 by -\wd1
10302 \GMPcentreline{\dimen0}%
10303 \hfil
10304 \box0%
10305 \hfil
10306 \GMPcentreline{\dimen1{}}%
10307 \box1}
10308 \hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}}
10309 \vskip 0.5ex
10310 \vbox {%
10311 \hrule
10312 \hbox{%
10313 \vrule height 2ex depth 1ex
10314 \hbox to \GMPboxwidth {}%
10315 \vrule
10316 \hbox to \GMPboxwidth {}%
10317 \vrule
10318 \hbox to \GMPboxwidth {}%
10319 \vrule
10320 \hbox to \GMPboxwidth {}%
10321 \vrule
10322 \hbox to \GMPboxwidth {}%
10323 \vrule}
10324 \hrule
10325 }
10326 \hbox {%
10327 \hbox to 0.8 pt {}
10328 \hbox to 3\GMPboxwidth {%
10329 \hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}}
10330 \hbox to 5\GMPboxwidth{%
10331 \setbox 0 = \hbox{@code{\_mp\_size}}%
10332 \dimen0 = 5\GMPboxwidth
10333 \advance\dimen0 by -\wd0
10334 \divide\dimen0 by 2
10335 \advance\dimen0 by -1em
10336 \dimen1 = \dimen0
10337 \setbox1 = \hbox{$\leftarrow$}%
10338 \setbox2 = \hbox{$\rightarrow$}%
10339 \advance\dimen0 by -\wd1
10340 \advance\dimen1 by -\wd2
10341 \hbox to 0.3 em {}%
10342 \box1
10343 \GMPcentreline{\dimen0}%
10344 \hfil
10345 \box0
10346 \hfil
10347 \GMPcentreline{\dimen1}%
10348 \box2}
10349}}
10350@end tex
10351@ifnottex
10352@example
10353 most least
10354significant significant
10355 limb limb
10356
10357 _mp_d
10358 |---- _mp_exp ---> |
10359 _____ _____ _____ _____ _____
10360 |_____|_____|_____|_____|_____|
10361 . <------------ radix point
10362
10363 <-------- _mp_size --------->
10364@sp 1
10365@end example
10366@end ifnottex
10367
10368@noindent
10369The fields are as follows.
10370
10371@table @asis
10372@item @code{_mp_size}
10373The number of limbs currently in use, or the negative of that when
10374representing a negative value. Zero is represented by @code{_mp_size} and
10375@code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is
10376unused. (In the future @code{_mp_exp} might be undefined when representing
10377zero.)
10378
10379@item @code{_mp_prec}
10380The precision of the mantissa, in limbs. In any calculation the aim is to
10381produce @code{_mp_prec} limbs of result (the most significant being non-zero).
10382
10383@item @code{_mp_d}
10384A pointer to the array of limbs which is the absolute value of the mantissa.
10385These are stored ``little endian'' as per the @code{mpn} functions, so
10386@code{_mp_d[0]} is the least significant limb and
10387@code{_mp_d[ABS(_mp_size)-1]} the most significant.
10388
10389The most significant limb is always non-zero, but there are no other
10390restrictions on its value, in particular the highest 1 bit can be anywhere
10391within the limb.
10392
10393@code{_mp_prec+1} limbs are allocated to @code{_mp_d}, the extra limb being
10394for convenience (see below). There are no reallocations during a calculation,
10395only in a change of precision with @code{mpf_set_prec}.
10396
10397@item @code{_mp_exp}
10398The exponent, in limbs, determining the location of the implied radix point.
10399Zero means the radix point is just above the most significant limb. Positive
10400values mean a radix point offset towards the lower limbs and hence a value
10401@math{@ge{} 1}, as for example in the diagram above. Negative exponents mean
10402a radix point further above the highest limb.
10403
10404Naturally the exponent can be any value, it doesn't have to fall within the
10405limbs as the diagram shows, it can be a long way above or a long way below.
10406Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data
10407are treated as zero.
10408@end table
10409
10410The @code{_mp_size} and @code{_mp_prec} fields are @code{int}, although the
10411@code{mp_size_t} type is usually a @code{long}. The @code{_mp_exp} field is
10412usually @code{long}. This is done to make some fields just 32 bits on some 64
10413bits systems, thereby saving a few bytes of data space but still providing
10414plenty of precision and a very large range.
10415
10416
10417@sp 1
10418@noindent
10419The following various points should be noted.
10420
10421@table @asis
10422@item Low Zeros
10423The least significant limbs @code{_mp_d[0]} etc can be zero, though such low
10424zeros can always be ignored. Routines likely to produce low zeros check and
10425avoid them to save time in subsequent calculations, but for most routines
10426they're quite unlikely and aren't checked.
10427
10428@item Mantissa Size Range
10429The @code{_mp_size} count of limbs in use can be less than @code{_mp_prec} if
10430the value can be represented in less. This means low precision values or
10431small integers stored in a high precision @code{mpf_t} can still be operated
10432on efficiently.
10433
10434@code{_mp_size} can also be greater than @code{_mp_prec}. Firstly a value is
10435allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d},
10436and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves
10437@code{_mp_size} unchanged and so the size can be arbitrarily bigger than
10438@code{_mp_prec}.
10439
10440@item Rounding
10441All rounding is done on limb boundaries. Calculating @code{_mp_prec} limbs
10442with the high non-zero will ensure the application requested minimum precision
10443is obtained.
10444
10445The use of simple ``trunc'' rounding towards zero is efficient, since there's
10446no need to examine extra limbs and increment or decrement.
10447
10448@item Bit Shifts
10449Since the exponent is in limbs, there are no bit shifts in basic operations
10450like @code{mpf_add} and @code{mpf_mul}. When differing exponents are
10451encountered all that's needed is to adjust pointers to line up the relevant
10452limbs.
10453
10454Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts,
10455but the choice is between an exponent in limbs which requires shifts there, or
10456one in bits which requires them almost everywhere else.
10457
10458@item Use of @code{_mp_prec+1} Limbs
10459The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just
10460@code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its
10461operation. @code{mpf_add} for instance will do an @code{mpn_add} of
10462@code{_mp_prec} limbs. If there's no carry then that's the result, but if
10463there is a carry then it's stored in the extra limb of space and
10464@code{_mp_size} becomes @code{_mp_prec+1}.
10465
10466Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not
10467needed for the intended precision, only the @code{_mp_prec} high limbs. But
10468zeroing it out or moving the rest down is unnecessary. Subsequent routines
10469reading the value will simply take the high limbs they need, and this will be
10470@code{_mp_prec} if their target has that same precision. This is no more than
10471a pointer adjustment, and must be checked anyway since the destination
10472precision can be different from the sources.
10473
10474Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs
10475if available. This ensures that a variable which has @code{_mp_size} equal to
10476@code{_mp_prec+1} will get its full exact value copied. Strictly speaking
10477this is unnecessary since only @code{_mp_prec} limbs are needed for the
10478application's requested precision, but it's considered that an @code{mpf_set}
10479from one variable into another of the same precision ought to produce an exact
10480copy.
10481
10482@item Application Precisions
10483@code{__GMPF_BITS_TO_PREC} converts an application requested precision to an
10484@code{_mp_prec}. The value in bits is rounded up to a whole limb then an
10485extra limb is added since the most significant limb of @code{_mp_d} is only
10486non-zero and therefore might contain only one bit.
10487
10488@code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra
10489limb from @code{_mp_prec} before converting to bits. The net effect of
10490reading back with @code{mpf_get_prec} is simply the precision rounded up to a
10491multiple of @code{mp_bits_per_limb}.
10492
10493Note that the extra limb added here for the high only being non-zero is in
10494addition to the extra limb allocated to @code{_mp_d}. For example with a
1049532-bit limb, an application request for 250 bits will be rounded up to 8
10496limbs, then an extra added for the high being only non-zero, giving an
10497@code{_mp_prec} of 9. @code{_mp_d} then gets 10 limbs allocated. Reading
10498back with @code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and
10499multiply by 32, giving 256 bits.
10500
10501Strictly speaking, the fact the high limb has at least one bit means that a
10502float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but
10503for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice
10504multiple of the limb size.
10505@end table
10506
10507
10508@node Raw Output Internals, C++ Interface Internals, Float Internals, Internals
10509@section Raw Output Internals
10510@cindex Raw output internals
10511
10512@noindent
10513@code{mpz_out_raw} uses the following format.
10514
10515@tex
10516\global\newdimen\GMPboxwidth \GMPboxwidth=5em
10517\global\newdimen\GMPboxheight \GMPboxheight=3ex
10518\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}
10519\GMPdisplay{%
10520\vbox{%
10521 \def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}
10522 \vbox {%
10523 \hrule
10524 \hbox{%
10525 \vrule height 2.5ex depth 1.5ex
10526 \hbox to \GMPboxwidth {\hfil size\hfil}%
10527 \vrule
10528 \hbox to 3\GMPboxwidth {\hfil data bytes\hfil}%
10529 \vrule}
10530 \hrule}
10531}}
10532@end tex
10533@ifnottex
10534@example
10535+------+------------------------+
10536| size | data bytes |
10537+------+------------------------+
10538@end example
10539@end ifnottex
10540
10541The size is 4 bytes written most significant byte first, being the number of
10542subsequent data bytes, or the twos complement negative of that when a negative
10543integer is represented. The data bytes are the absolute value of the integer,
10544written most significant byte first.
10545
10546The most significant data byte is always non-zero, so the output is the same
10547on all systems, irrespective of limb size.
10548
10549In GMP 1, leading zero bytes were written to pad the data bytes to a multiple
10550of the limb size. @code{mpz_inp_raw} will still accept this, for
10551compatibility.
10552
10553The use of ``big endian'' for both the size and data fields is deliberate, it
10554makes the data easy to read in a hex dump of a file. Unfortunately it also
10555means that the limb data must be reversed when reading or writing, so neither
10556a big endian nor little endian system can just read and write @code{_mp_d}.
10557
10558
10559@node C++ Interface Internals, , Raw Output Internals, Internals
10560@section C++ Interface Internals
10561@cindex C++ interface internals
10562
10563A system of expression templates is used to ensure something like @code{a=b+c}
10564turns into a simple call to @code{mpz_add} etc. For @code{mpf_class}
10565the scheme also ensures the precision of the final
10566destination is used for any temporaries within a statement like
10567@code{f=w*x+y*z}. These are important features which a naive implementation
10568cannot provide.
10569
10570A simplified description of the scheme follows. The true scheme is
10571complicated by the fact that expressions have different return types. For
10572detailed information, refer to the source code.
10573
10574To perform an operation, say, addition, we first define a ``function object''
10575evaluating it,
10576
10577@example
10578struct __gmp_binary_plus
10579@{
10580 static void eval(mpf_t f, const mpf_t g, const mpf_t h)
10581 @{
10582 mpf_add(f, g, h);
10583 @}
10584@};
10585@end example
10586
10587@noindent
10588And an ``additive expression'' object,
10589
10590@example
10591__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
10592operator+(const mpf_class &f, const mpf_class &g)
10593@{
10594 return __gmp_expr
10595 <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
10596@}
10597@end example
10598
10599The seemingly redundant @code{__gmp_expr<__gmp_binary_expr<@dots{}>>} is used to
10600encapsulate any possible kind of expression into a single template type. In
10601fact even @code{mpf_class} etc are @code{typedef} specializations of
10602@code{__gmp_expr}.
10603
10604Next we define assignment of @code{__gmp_expr} to @code{mpf_class}.
10605
10606@example
10607template <class T>
10608mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
10609@{
10610 expr.eval(this->get_mpf_t(), this->precision());
10611 return *this;
10612@}
10613
10614template <class Op>
10615void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
10616(mpf_t f, mp_bitcnt_t precision)
10617@{
10618 Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
10619@}
10620@end example
10621
10622where @code{expr.val1} and @code{expr.val2} are references to the expression's
10623operands (here @code{expr} is the @code{__gmp_binary_expr} stored within the
10624@code{__gmp_expr}).
10625
10626This way, the expression is actually evaluated only at the time of assignment,
10627when the required precision (that of @code{f}) is known. Furthermore the
10628target @code{mpf_t} is now available, thus we can call @code{mpf_add} directly
10629with @code{f} as the output argument.
10630
10631Compound expressions are handled by defining operators taking subexpressions
10632as their arguments, like this:
10633
10634@example
10635template <class T, class U>
10636__gmp_expr
10637<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
10638operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
10639@{
10640 return __gmp_expr
10641 <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
10642 (expr1, expr2);
10643@}
10644@end example
10645
10646And the corresponding specializations of @code{__gmp_expr::eval}:
10647
10648@example
10649template <class T, class U, class Op>
10650void __gmp_expr
10651<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
10652(mpf_t f, mp_bitcnt_t precision)
10653@{
10654 // declare two temporaries
10655 mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
10656 Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
10657@}
10658@end example
10659
10660The expression is thus recursively evaluated to any level of complexity and
10661all subexpressions are evaluated to the precision of @code{f}.
10662
10663
10664@node Contributors, References, Internals, Top
10665@comment node-name, next, previous, up
10666@appendix Contributors
10667@cindex Contributors
10668
10669Torbj@"orn Granlund wrote the original GMP library and is still the main
10670developer. Code not explicitly attributed to others, was contributed by
10671Torbj@"orn. Several other individuals and organizations have contributed
10672GMP. Here is a list in chronological order on first contribution:
10673
10674Gunnar Sj@"odin and Hans Riesel helped with mathematical problems in early
10675versions of the library.
10676
10677Richard Stallman helped with the interface design and revised the first
10678version of this manual.
10679
10680Brian Beuning and Doug Lea helped with testing of early versions of the
10681library and made creative suggestions.
10682
10683John Amanatides of York University in Canada contributed the function
10684@code{mpz_probab_prime_p}.
10685
10686Paul Zimmermann wrote the REDC-based mpz_powm code, the Sch@"onhage-Strassen
10687FFT multiply code, and the Karatsuba square root code. He also improved the
10688Toom3 code for GMP 4.2. Paul sparked the development of GMP 2, with his
10689comparisons between bignum packages. The ECMNET project Paul is organizing
10690was a driving force behind many of the optimizations in GMP 3. Paul also
10691wrote the new GMP 4.3 nth root code (with Torbj@"orn).
10692
10693Ken Weber (Kent State University, Universidade Federal do Rio Grande do Sul)
10694contributed now defunct versions of @code{mpz_gcd}, @code{mpz_divexact},
10695@code{mpn_gcd}, and @code{mpn_bdivmod}, partially supported by CNPq (Brazil)
10696grant 301314194-2.
10697
10698Per Bothner of Cygnus Support helped to set up GMP to use Cygnus' configure.
10699He has also made valuable suggestions and tested numerous intermediary
10700releases.
10701
10702Joachim Hollman was involved in the design of the @code{mpf} interface, and in
10703the @code{mpz} design revisions for version 2.
10704
10705Bennet Yee contributed the initial versions of @code{mpz_jacobi} and
10706@code{mpz_legendre}.
10707
10708Andreas Schwab contributed the files @file{mpn/m68k/lshift.S} and
10709@file{mpn/m68k/rshift.S} (now in @file{.asm} form).
10710
10711Robert Harley of Inria, France and David Seal of ARM, England, suggested clever
10712improvements for population count. Robert also wrote highly optimized
10713Karatsuba and 3-way Toom multiplication functions for GMP 3, and contributed
10714the ARM assembly code.
10715
10716Torsten Ekedahl of the Mathematical department of Stockholm University provided
10717significant inspiration during several phases of the GMP development. His
10718mathematical expertise helped improve several algorithms.
10719
10720Linus Nordberg wrote the new configure system based on autoconf and
10721implemented the new random functions.
10722
10723Kevin Ryde worked on a large number of things: optimized x86 code, m4 asm
10724macros, parameter tuning, speed measuring, the configure system, function
10725inlining, divisibility tests, bit scanning, Jacobi symbols, Fibonacci and Lucas
10726number functions, printf and scanf functions, perl interface, demo expression
10727parser, the algorithms chapter in the manual, @file{gmpasm-mode.el}, and
10728various miscellaneous improvements elsewhere.
10729
10730Kent Boortz made the Mac OS 9 port.
10731
10732Steve Root helped write the optimized alpha 21264 assembly code.
10733
10734Gerardo Ballabio wrote the @file{gmpxx.h} C++ class interface and the C++
10735@code{istream} input routines.
10736
10737Jason Moxham rewrote @code{mpz_fac_ui}.
10738
10739Pedro Gimeno implemented the Mersenne Twister and made other random number
10740improvements.
10741
10742Niels M@"oller wrote the sub-quadratic GCD, extended GCD and jacobi code, the
10743quadratic Hensel division code, and (with Torbj@"orn) the new divide and
10744conquer division code for GMP 4.3. Niels also helped implement the new Toom
10745multiply code for GMP 4.3 and implemented helper functions to simplify Toom
10746evaluations for GMP 5.0. He wrote the original version of mpn_mulmod_bnm1, and
10747he is the main author of the mini-gmp package used for gmp bootstrapping.
10748
10749Alberto Zanoni and Marco Bodrato suggested the unbalanced multiply strategy,
10750and found the optimal strategies for evaluation and interpolation in Toom
10751multiplication.
10752
10753Marco Bodrato helped implement the new Toom multiply code for GMP 4.3 and
10754implemented most of the new Toom multiply and squaring code for 5.0.
10755He is the main author of the current mpn_mulmod_bnm1, mpn_mullo_n, and
10756mpn_sqrlo. Marco also wrote the functions mpn_invert and mpn_invertappr,
10757and improved the speed of integer root extraction. He is the author of
10758mini-mpq, an additional layer to mini-gmp; of most of the combinatorial
10759functions and the BPSW primality testing implementation, for both the
10760main library and the mini-gmp package.
10761
10762David Harvey suggested the internal function @code{mpn_bdiv_dbm1}, implementing
10763division relevant to Toom multiplication. He also worked on fast assembly
10764sequences, in particular on a fast AMD64 @code{mpn_mul_basecase}. He wrote
10765the internal middle product functions @code{mpn_mulmid_basecase},
10766@code{mpn_toom42_mulmid}, @code{mpn_mulmid_n} and related helper routines.
10767
10768Martin Boij wrote @code{mpn_perfect_power_p}.
10769
10770Marc Glisse improved @file{gmpxx.h}: use fewer temporaries (faster),
10771specializations of @code{numeric_limits} and @code{common_type}, C++11
10772features (move constructors, explicit bool conversion, UDL), make the
10773conversion from @code{mpq_class} to @code{mpz_class} explicit, optimize
10774operations where one argument is a small compile-time constant, replace
10775some heap allocations by stack allocations. He also fixed the eofbit
10776handling of C++ streams, and removed one division from @file{mpq/aors.c}.
10777
10778David S Miller wrote assembly code for SPARC T3 and T4.
10779
10780Mark Sofroniou cleaned up the types of mul_fft.c, letting it work for huge
10781operands.
10782
10783Ulrich Weigand ported GMP to the powerpc64le ABI.
10784
10785(This list is chronological, not ordered after significance. If you have
10786contributed to GMP but are not listed above, please tell
10787@email{gmp-devel@@gmplib.org} about the omission!)
10788
10789The development of floating point functions of GNU MP 2, were supported in part
10790by the ESPRIT-BRA (Basic Research Activities) 6846 project POSSO (POlynomial
10791System SOlving).
10792
10793The development of GMP 2, 3, and 4.0 was supported in part by the IDA Center
10794for Computing Sciences.
10795
10796The development of GMP 4.3, 5.0, and 5.1 was supported in part by the Swedish
10797Foundation for Strategic Research.
10798
10799Thanks go to Hans Thorsen for donating an SGI system for the GMP test system
10800environment.
10801
10802@node References, GNU Free Documentation License, Contributors, Top
10803@comment node-name, next, previous, up
10804@appendix References
10805@cindex References
10806
10807@c FIXME: In tex, the @uref's are unhyphenated, which is good for clarity,
10808@c but being long words they upset paragraph formatting (the preceding line
10809@c can get badly stretched). Would like an conditional @* style line break
10810@c if the uref is too long to fit on the last line of the paragraph, but it's
10811@c not clear how to do that. For now explicit @texlinebreak{}s are used on
10812@c paragraphs that come out bad.
10813
10814@section Books
10815
10816@itemize @bullet
10817@item
10818Jonathan M. Borwein and Peter B. Borwein, ``Pi and the AGM: A Study in
10819Analytic Number Theory and Computational Complexity'', Wiley, 1998.
10820
10821@item
10822Richard Crandall and Carl Pomerance, ``Prime Numbers: A Computational
10823Perspective'', 2nd edition, Springer-Verlag, 2005.
10824@texlinebreak{} @uref{https://www.math.dartmouth.edu/~carlp/}
10825
10826@item
10827Henri Cohen, ``A Course in Computational Algebraic Number Theory'', Graduate
10828Texts in Mathematics number 138, Springer-Verlag, 1993.
10829@texlinebreak{} @uref{https://www.math.u-bordeaux.fr/~cohen/}
10830
10831@item
10832Donald E. Knuth, ``The Art of Computer Programming'', volume 2,
10833``Seminumerical Algorithms'', 3rd edition, Addison-Wesley, 1998.
10834@texlinebreak{} @uref{https://www-cs-faculty.stanford.edu/~knuth/taocp.html}
10835
10836@item
10837John D. Lipson, ``Elements of Algebra and Algebraic Computing'',
10838The Benjamin Cummings Publishing Company Inc, 1981.
10839
10840@item
10841Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, ``Handbook of
10842Applied Cryptography'', @uref{http://www.cacr.math.uwaterloo.ca/hac/}
10843
10844@item
10845Richard M. Stallman and the GCC Developer Community, ``Using the GNU Compiler
10846Collection'', Free Software Foundation, 2008, available online
10847@uref{https://gcc.gnu.org/onlinedocs/}, and in the GCC package
10848@uref{https://ftp.gnu.org/gnu/gcc/}
10849@end itemize
10850
10851@section Papers
10852
10853@itemize @bullet
10854@item
10855Yves Bertot, Nicolas Magaud and Paul Zimmermann, ``A Proof of GMP Square
10856Root'', Journal of Automated Reasoning, volume 29, 2002, pp.@: 225-252. Also
10857available online as INRIA Research Report 4475, June 2002,
10858@uref{https://hal.inria.fr/docs/00/07/21/13/PDF/RR-4475.pdf}
10859
10860@item
10861Christoph Burnikel and Joachim Ziegler, ``Fast Recursive Division'',
10862Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022,
10863@texlinebreak{} @uref{https://www.mpi-inf.mpg.de/~ziegler/TechRep.ps.gz}
10864
10865@item
10866Torbj@"orn Granlund and Peter L. Montgomery, ``Division by Invariant Integers
10867using Multiplication'', in Proceedings of the SIGPLAN PLDI'94 Conference, June
108681994. Also available @uref{https://gmplib.org/~tege/divcnst-pldi94.pdf}.
10869
10870@item
10871Niels M@"oller and Torbj@"orn Granlund, ``Improved division by invariant
10872integers'', IEEE Transactions on Computers, 11 June 2010.
10873@uref{https://gmplib.org/~tege/division-paper.pdf}
10874
10875@item
10876Torbj@"orn Granlund and Niels M@"oller, ``Division of integers large and
10877small'', to appear.
10878
10879@item
10880Tudor Jebelean,
10881``An algorithm for exact division'',
10882Journal of Symbolic Computation,
10883volume 15, 1993, pp.@: 169-180.
10884Research report version available @texlinebreak{}
10885@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz}
10886
10887@item
10888Tudor Jebelean, ``Exact Division with Karatsuba Complexity - Extended
10889Abstract'', RISC-Linz technical report 96-31, @texlinebreak{}
10890@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz}
10891
10892@item
10893Tudor Jebelean, ``Practical Integer Division with Karatsuba Complexity'',
10894ISSAC 97, pp.@: 339-341. Technical report available @texlinebreak{}
10895@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz}
10896
10897@item
10898Tudor Jebelean, ``A Generalization of the Binary GCD Algorithm'', ISSAC 93,
10899pp.@: 111-116. Technical report version available @texlinebreak{}
10900@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz}
10901
10902@item
10903Tudor Jebelean, ``A Double-Digit Lehmer-Euclid Algorithm for Finding the GCD
10904of Long Integers'', Journal of Symbolic Computation, volume 19, 1995,
10905pp.@: 145-157. Technical report version also available @texlinebreak{}
10906@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz}
10907
10908@item
10909Werner Krandick and Tudor Jebelean, ``Bidirectional Exact Integer Division'',
10910Journal of Symbolic Computation, volume 21, 1996, pp.@: 441-455. Early
10911technical report version also available
10912@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz}
10913
10914@item
10915Makoto Matsumoto and Takuji Nishimura, ``Mersenne Twister: A 623-dimensionally
10916equidistributed uniform pseudorandom number generator'', ACM Transactions on
10917Modelling and Computer Simulation, volume 8, January 1998, pp.@: 3-30.
10918Available online @texlinebreak{}
10919@uref{http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/mt.pdf}
10920
10921@item
10922R. Moenck and A. Borodin, ``Fast Modular Transforms via Division'',
10923Proceedings of the 13th Annual IEEE Symposium on Switching and Automata
10924Theory, October 1972, pp.@: 90-96. Reprinted as ``Fast Modular Transforms'',
10925Journal of Computer and System Sciences, volume 8, number 3, June 1974,
10926pp.@: 366-386.
10927
10928@item
10929Niels M@"oller, ``On Sch@"onhage's algorithm and subquadratic integer GCD
10930 computation'', in Mathematics of Computation, volume 77, January 2008, pp.@:
10931 589-607, @uref{https://www.ams.org/journals/mcom/2008-77-261/S0025-5718-07-02017-0/home.html}
10932
10933@item
10934Peter L. Montgomery, ``Modular Multiplication Without Trial Division'', in
10935Mathematics of Computation, volume 44, number 170, April 1985.
10936
10937@item
10938Arnold Sch@"onhage and Volker Strassen, ``Schnelle Multiplikation grosser
10939Zahlen'', Computing 7, 1971, pp.@: 281-292.
10940
10941@item
10942Kenneth Weber, ``The accelerated integer GCD algorithm'',
10943ACM Transactions on Mathematical Software,
10944volume 21, number 1, March 1995, pp.@: 111-122.
10945
10946@item
10947Paul Zimmermann, ``Karatsuba Square Root'', INRIA Research Report 3805,
10948November 1999, @uref{https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf}
10949
10950@item
10951Paul Zimmermann, ``A Proof of GMP Fast Division and Square Root
10952Implementations'', @texlinebreak{}
10953@uref{https://homepages.loria.fr/PZimmermann/papers/proof-div-sqrt.ps.gz}
10954
10955@item
10956Dan Zuras, ``On Squaring and Multiplying Large Integers'', ARITH-11: IEEE
10957Symposium on Computer Arithmetic, 1993, pp.@: 260 to 271. Reprinted as ``More
10958on Multiplying and Squaring Large Integers'', IEEE Transactions on Computers,
10959volume 43, number 8, August 1994, pp.@: 899-908.
10960
10961@item
10962Niels M@"oller, ``Efficient computation of the Jacobi symbol'', @texlinebreak{}
10963@uref{https://arxiv.org/abs/1907.07795}
10964@end itemize
10965
10966@node GNU Free Documentation License, Concept Index, References, Top
10967@appendix GNU Free Documentation License
10968@cindex GNU Free Documentation License
10969@cindex Free Documentation License
10970@cindex Documentation license
10971@include fdl-1.3.texi
10972
10973
10974@node Concept Index, Function Index, GNU Free Documentation License, Top
10975@comment node-name, next, previous, up
10976@unnumbered Concept Index
10977@printindex cp
10978
10979@node Function Index, , Concept Index, Top
10980@comment node-name, next, previous, up
10981@unnumbered Function and Type Index
10982@printindex fn
10983
10984@bye
10985
10986@c Local variables:
10987@c fill-column: 78
10988@c compile-command: "make gmp.info"
10989@c End: