Codebase list ntl / upstream/11.0.0 doc / tour-impl.html
upstream/11.0.0

Tree @upstream/11.0.0 (Download .tar.gz)

tour-impl.html @upstream/11.0.0raw · history · blame

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
<html>
<head>
<title>
A Tour of NTL: NTL Implementation and Portability  </title>
</head>

<center>
<a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
 <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> 
<a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
</center>

<h1> 
<p align=center>
A Tour of NTL: NTL Implementation and Portability 
</p>
</h1>

<p> <hr> <p>

NTL is designed to be portable, fast,
and relatively easy to use and extend.

<p>
To make NTL portable, no assembly code is used (well, almost none, see below).
This is highly desirable, as architectures are constantly
changing and evolving, and maintaining assembly
code is quite costly.
By avoiding assembly code, NTL should remain usable,
with virtually no maintenance, for many years.

<p>

<h3>Minimal platform requirements</h3>

When the configuration flag <tt>NTL_CLEAN_INT</tt>
is <i>on</i> (this is not the default,
see below),
NTL makes two requirements
of its platform,
neither of which are guaranteed by the <tt>C++</tt> language
definition, but are essentially universal:

<ol>
<li>
<tt>int</tt> and <tt>long</tt> quantities, respectively, 
are represented using a 2's complement
representation whose width is equal to the width of <tt>unsigned int</tt>
and <tt>unsigned long</tt>, respectively.
<li>
Double precision floating point
conforms to the IEEE floating point standard.
</ol>

<p>
NTL makes very conservative requirements of the <tt>C++</tt> compiler:
<ul>
<li>
it is assumed that the <tt>C++</tt> compiler conforms to the 1998 standard,
including the basic of templates;
<li>
it does not assume any features not in the 1998 standard,
unless compiled with a flag that requires features
from the 2011 standard
(<tt>NTL_THREADS</tt>,  
<tt>NTL_EXCEPTIONS</tt>,
<tt>NTL_SAFE_VECTORS</tt>)
or a flag that explicitly requests a standard
(<tt>NTL_STD_CXX11</tt>,
<tt>NTL_STD_CXX14</tt>).
</ul>

<p>
At some point in the future, it is expected that
NTL will move to requiring the 2011 standard or later.


<p>

<h3>The <tt>NTL_CLEAN_INT</tt> flag</h3>

<p>

The configuration flag <tt>NTL_CLEAN_INT</tt> 
is currently <i>off</i> by default.

<p>
When this flag is off, NTL makes a couple of
other requirements of its platform.
<p>
First, that conversions from <tt>unsigned long</tt> to <tt>long</tt> 
convert the bit pattern without change to the corresponding 2's complement 
signed integer.
Note that the <tt>C++</tt> standard defines the behavior of 
converting unsigned
to signed values as <i>implementation defined</i> when the value
cannot be represented in the range of nonnegative signed values.
Nevertheless, this behavior is essentially universal, and more importantly,
is is not <i>undefined behavior</i>: implementation-defined behavior must be
documented and respected by the compiler, while undefined behavior can
be exploited by the compiler in some surprising ways.


<p>
Second, 
right shifts of signed integers are consistent,
in the sense that if it is sometimes an arithmetic shift,
then it is always an arithmetic shift (the installation
scripts check if right shift appears to be arithmetic, and if so,
this assumption is made elsewhere). 
Arithmetic right shift is also <i>implementation defined</i> behavior
that is essentially universal.


<p>
It seems fairly unlikely that one would ever have to turn the
<tt>NTL_CLEAN_INT</tt> flag <i>on</i>, but it seems a good idea
to make this possible, and at the very least 
to identify and isolate the code that
relies on these asumptions.
Actually, the most recent versions of NTL (especially
since v10.0), there is very
little such code remaining, and it is not really all that critical
to performance any more.
Eventually, all such code may disappear completely.



<h3>Some floating point issues</h3>


<p>
NTL uses double precision floating point arithmetic in a number of places,
including a number of exact computations, where one might
not expect to see floating point.
Relying on floating point may seem prone to errors,
but with the guarantees provided by the IEEE 754 standard,
one can prove the correctness of the NTL code that uses floating point.







<p>
Generally, NTL assumes that the IEEE standard is 
correctly implemented.
In particular,
it assumes that the <i>compiler 
issues instructions that respect the
grouping of floating point operations.</i>
For example, the compiler is not allowed to
compile <tt>(a+b)+c</tt> as <tt>a+(b+c)</tt>,
or <tt>a*b + a*c</tt> as <tt> a*(b+c)</tt>.

<p>
By default, most compilers will satisfy this assumption
at normal optimization levels (for example, at the <tt>-O2</tt>
optimization level of most compilers).
One should avoid higher optimization levels like <tt>-O3</tt>
or special flags like <tt>-Ofast</tt>.
In practice, these optimization flags do not help much,
and may result in incorrect code.
One important compiler which does not satisfy this assumption
by default is 
Intel's <tt>icc</tt> compiler:
one has to pass the
<tt>-fp-model precise</tt> flag to <tt>icc</tt>
to correct this problem.
NTL's configuration script will take care of this
automatically, 
by including  <tt>-fp-model precise</tt>
in the <tt>CXXAUTOFLAGS</tt> make variable,
so you shouldn't have to worry about it.

<p>
It should be said that most floating point code in
NTL is quite robust, and will still work correctly
under somewhat weaker assumptions.
Moreover, NTL's header files are structured so that
programs that use NTL do not need to be compiled
in a way that satisfies these assumptions.
For example, NTL client code could be compiled
using <tt>icc</tt>'s defaults or using <tt>gcc</tt>
with the <tt>-Ofast</tt> flag.

<p>
There are a few other issues to address.

<ul>
<li>
<i>Extended precision.</i>
Standard compliant
compilers
may compute intermediate results
in extended precision.
The only machine for which the extended precision is
an issue are old (pre-SSE) <tt>x86</tt> machines
that rely exclusively on the <tt>x87</tt> coprocessor
instruction set.

<p>
<li>
<i>
Contractions and FMA.
</i>
Standard compliant
compilers
may "contract" certain sequences of operations
into a single operation
with just one rounding.
The main issue here is that
many newer machines have a "fused multiply-add (FMA)" instruction
that computes <tt>a*b+c</tt> with just a single rounding,
and compilers may choose generate this instruction. 

<p>
Except for one module, NTL works perfectly well
even with extended precision and/or contractions.
The one exception is the
<tt>quad_float</tt>,
which requires strict adherence 
to the above floating point assumptions, and in addition,
requires that all computations are carried
out without extended precision or contractions.
NTL goes to great lengths to detect and work around
these issues, so you should not have to worry about it.
More details can be found in 
<a href="quad_float.cpp.html">quad_float.txt</a>.


<p>
<li>
<i>
Special values.
</i>
The compiler
should handle
special values
(infinities, NaN's, signed zeroes, denormalized numbers)
correctly.


<p>
NTL does not rely very much on these
for its own computations.
For certain operations (like conversions) it assumes
that and infinities and NaN's behave as expected
so they can be detected.
It in some bounds checking code, it assumes that infinities
basically work the right way.
NTL does not make any assumptions about signed zeros
or denormalized numbers.


<p>
<li>
<i>
Conversions.
</i>
The compiler
should convert double's to long's
by truncation.
It should should convert long's to double's
exactly if possible, 
and otherwise,
to either
the next lower or higher representable value.
This behavior is required by the <tt>C++</tt> standard.

<p>
<li>
<i>
Rounding modes.
</i>
On some platforms, it is possible to dynamically change
the rounding mode from the default round-to-nearest mode to
other modes (e.g., round to zero).
<p>
The core algorithms in NTL that use floating point to 
assist in computing exact integer results are conservatively
designed so that they will work in any rounding mode.
However, it is not recommended to run NTL code in non-default
rounding modes.
First, NTL has not been thoroughly tested in these other modes.
Second, some higher-level floating-point code (such as <tt>quad_float</tt>) 
may not give expected results in other modes.


</ul>
<p>



<p>
<h3>Algorithms</h3>
<p>
NTL makes fairly consistent use of asymptotically fast algorithms.

<p>
Long integer multiplication is implemented using the classical
algorithm, crossing over to Karatsuba for very big numbers.
Long integer division is currently only implemented using
the classical algorithm -- unless you use NTL with GMP (version 3 or later),
which
employs an algorithm that is about twice as slow as multiplication
for very large numbers.
<p>
Polynomial multiplication and division is carried out
using a combination of the classical algorithm, Karatsuba,
the FFT using small primes, and the FFT using the Schoenhagge-Strassen
approach.
The choice of algorithm depends on the coefficient domain.
<p>
Many algorithms employed throughout NTL are inventions
of the author (<a href="http://www.shoup.net">Victor Shoup</a>) 
and his colleagues 
<a href="http://math-www.uni-paderborn.de/~aggathen/joachim.html">Joachim von zur Gathen</a>
and
<a href="http://www4.ncsu.edu/~kaltofen">Erich Kaltofen</a>,
as well as <a href="mailto:abbott@dima.unige.it">John Abbott</a>
and
<a href="http://www.loria.fr/~zimmerma">Paul Zimmermann</a>.

<p>
<h3>
<a name="threadimpl">
Thread safety
</h3>
<p>
As of v7.0, NTL is thread safe.
That said, there are several things to be aware of:
<ul>

<li>
To use this feature, you have to enable <tt>NTL_THREADS</tt>
in the configuration script.
Also, you will need a compiler and runtime library that 
implements several key <tt>C++11</tt> features,
including <tt>thread_local</tt> storage.
<ul>
<li>
NOTE: as of v9.8, the requirements have been relaxed, so that
for gcc and gcc-compatible compilers
(such as clang and icc) only support of the gcc <tt>__thread</tt>
storage specifier is required.
<li>
With these relaxed requirements, it is possible to build
a thread safe version of NTL on Linux using gcc 4.8 and above,
or on Mac OSX 10.10  and above.

</ul>

<p>
<li>
Prior to v10.0 of NTL, you had to use NTL with GMP to get thread safety.
Since v10.0, this requirement has been dropped.


<p>
<li>
As of v11.0, <tt>NTL_THREADS</tt> is on by default.


<p>
<li>
If you use the external gf2x library, you must 
use version 1.2 of that library or later.
Earlier versions are not thread safe.
</ul>

To obtain thread safety, I used the following strategies:
<ul>
<p>
<li>
In places where NTL's interface demands global variables,
such as the "current modulus" for the <tt>ZZ_p</tt>
class, these global variables have been made <i>thread local</i>.
So, you can pass around various <tt>ZZ_pContext</tt> objects
among threads, and individual threads can install these locally.
Thus, different threads can concurrently use the same or different
moduli, and it all just works, with no changes to NTL's interface.

<p>
<li>
In places where NTL used static variables to hold on to space
for scratch variables, I make these variables <i>thread local</i>,
and I also make sure the storage used by these variables
get released when the thread terminates.
In all NTL builds (thread safe or not), 
I try to make sure that fairly large chunks of memory get released immediately.

<p>
<li>
In places where NTL uses a lazy strategy to build various tables
(such as FFT primes), I uses a "double checked locking" strategy
to grow these tables in a way that (a) the tables can be
shared among different threads, and (b) taking a lock
on a <tt>mutex</tt> is very rare.
The new <tt>C++11</tt> concurrent memory model is essential here.

<p>
<li>
Smart pointers (for things like <tt>ZZ_pContext</tt>'s) are 
designed to do the necesary reference counting in a thread-safe
manner.

<p>
<li>
For psuedo-random number generation, 
the internal state of the PRG 
is thread local,
and the default initial seed is guaranteed to be unique
among all threads in a given process (and an attempt is made to
make the seed globally unique among all processes and threads,
but this is hard to do in a completely portable way).


</ul>

The overall structure of the code has been modified so that
the code base is nearly identical for regular and thread-safe builds:
there are just a few <tt>ifdef</tt>'s on the <tt>NTL_THREADS</tt>
flag.


<p>
<h3>
Thread Boosting
</h3>
<p>

As of v9.5.0, NTL provides a <i>thread boosting</i> feature.
With this feature, certain code within NTL will use available
threads to speed up computations on a multicore
machine.
This feature is enabled by setting <tt>NTL_THREAD_BOOST=on</tt>
during configuration.
See <a href="BasicThreadPool.cpp.html">BasicThreadPool.txt</a>
for more information.

<p>
As of v11.0, <tt>NTL_THREAD_BOOST</tt> is on by default.


<p>
This feature is a work in progress.
As time goes on, more and more code within NTL is being thread boosted.


<p>
<h3>
Error Handling and Exceptions
</h3>
<p>

As of v8.0, NTL provides error handling through exceptions.
To enable exptions, you have to configure
NTL with <tt>NTL_EXCEPTIONS</tt> flag turned on.
By default, exceptions are not enabled, and NTL
reverts to its old error handling method:
abort with an error message. 

<p>
If exceptions are enabled, then instead of aborting your
program, and appropriate exception is thrown.
More details ion the programming interface
of this feature are available <a href="tour-struct.html#except">here</a>.

<p>
If you enable exceptions, you must use a <tt>C++11</tt> compiler.
Specifically, your compiler will need support for lambdas
(which are used to conveniently implement the "scope guard" idiom),
and your compiler should implement the new default exception
specification semantics (namely, that destructors are "noexcept" by 
default).

<p>
Implementation of this required a top-to-bottom scrub of NTL's code,
replacing a lot of old-fashioned code with more modern, RAII-oriented
code (RAII = "resource acquisition is initialization").





<p>

<center>
<a href="tour-tips.html"><img src="arrow1.gif" alt="[Previous]" align=bottom></a>
 <a href="tour.html"><img src="arrow2.gif" alt="[Up]" align=bottom></a> 
<a href="tour-gmp.html"> <img src="arrow3.gif" alt="[Next]" align=bottom></a>
</center>


</body>
</html>