Tree @lintian-fixes/main (Download .tar.gz)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 | # Random123: a Library of Counter-Based Random Number Generators <!-- Note that this file is both README.md and the doxygen mainpage. It should be minimally processed to uncomment the @ref directives before doxygen is run on it. --> The Random123 library is a collection of counter-based random number generators (<!-- @ref CBRNG--> "CBRNGs") for CPUs (C and C++) and GPUs (CUDA and OpenCL), as described in <a href="http://dl.acm.org/citation.cfm?doid=2063405"><i>Parallel Random Numbers: As Easy as 1, 2, 3</i>, Salmon, Moraes, Dror & Shaw, SC11, Seattle, Washington, USA, 2011, ACM </a>. They are intended for use in statistical applications and Monte Carlo simulation and have passed all of the rigorous SmallCrush, Crush and BigCrush tests in the <a href="http://www.iro.umontreal.ca/~simardr/testu01/tu01.html"> extensive TestU01 suite</a> of statistical tests for random number generators. They are **not** suitable for use in cryptography or security even though they are constructed using principles drawn from cryptography. The Random123 library is implemented entirely in header files. See [below](#installation-and-testing), for how to install and use the library, and how to generate documentation with doxygen. CBRNGs are as fast as, or faster than conventional RNGs, much easier to parallelize, use minimal memory/cache resources, and require very little code. On modern architectures, the Random123 CBRNGs require a few cycles per byte of random data returned and return random data in convenient sizes (arrays of two or four elements, each of which is an unsigned integer of 32 or 64 bits). The range of random numbers is the full representable range of the 32 or 64 bit unsigned integer) The `<Random123/uniform.h>` header contains utility functions to convert 32- and 64-bit unsigned integers to open or closed ranges of single or double precision floating point numbers. The Random123 library was written by John Salmon and Mark Moraes. It is available at <a href="https://github.com/DEShawResearch/random123"> https://github.com/DEShawResearch/random123</a>. Archived releases are also available from <a href="http://deshawresearch.com/resources_random123.html"> http://deshawresearch.com/resources_random123.html.</a> Please see the <!-- @ref LICENSE--> "LICENSE" for terms and conditions. ## Overview Unlike conventional RNGs, counter-based RNGs are *stateless* functions (or function classes i.e. functors) whose arguments are a *counter*, and a *key* that return a result of the same type as the counter. result = CBRNGname(counter, key) The returned result is a deterministic function of the key and counter, i.e. a unique (counter, key) tuple will always produce the same result. The result is highly sensitive to small changes in the inputs, so that the sequence of values produced by simply incrementing the counter (or key) is effectively indistinguishable from a sequence of samples of a uniformly distributed random variable. For all the CBRNGs in the Random123 library, the result and counter are the same type, specifically an array of *N* words, where words have a width of *W* bits, encapsulated in <!-- @ref arrayNxW--> "r123arrayNxW" structs, or equivalently, for C++, in the <!-- @ref r123::Array1x32--> "ArrayNxW" typedefs in the r123 namespace. Keys are usually also arrayMxW types, but sometimes M is a different size than the counter N (e.g. Philox keys have half the number of elements as the counter, Threefry and ARS have the same number, AES uses an opaque key type rather than an array) The N random numbers returned in `result.v[]` are unsigned integers of width W (32 or 64), and the range of the random numbers is the full range of the unsigned integer of that width (i.e. 0 to 2^W-1) In C++, all public names (classes, structs, typedefs, etc) are in the `r123` namespace. In C, the public names (functions, enums, structs, typedefs) begin either with `r123` or with one of the RNG family names, e.g., `threefry`, `philox`, `ars`, `aesni`. The RNG functions themselves have names like `philox4x32`. C++ class names are capitalized, e.g., `Threefry4x32`. ## <!-- \anchor families--> The different families of Random123 generators Several families of CBRNGs are available in this version of the library: <ul> <li> <!-- @ref ThreefryNxW--> "Threefry" is a **non-cryptographic** adaptation of the Threefish block cipher from the <a href="http://www.skein-hash.info/"> Skein Hash Function</a>. See <!-- @ref--> r123::Threefry2x32, <!-- @ref--> r123::Threefry4x32, <!-- @ref--> r123::Threefry2x64, <!-- @ref--> r123::Threefry4x64. <li> <!-- @ref PhiloxNxW--> "Philox" uses a Feistel network and integer multiplication. See <!-- @ref--> r123::Philox2x32, <!-- @ref--> r123::Philox4x32, <!-- @ref--> r123::Philox2x64, <!-- @ref--> r123::Philox4x64. The Nx64 forms are only available on hardware that supports 64-bit multiplication producing a 128-bit result. <li> <!@ref AESNI--> "AESNI" uses the Advanced Encryption Standard (AES) New Instruction, available on certain modern x86 processors (some models of Intel Westmere and Sandy Bridge, and AMD Interlagos, as of 2011). AESNI CBRNGs can operate on four 32bit words (internally converting them to the 128bit SSE type needed by the AES-NI instructions, or on a single m128i "word", which holds the SSE type. See <!-- @ref--> r123::AESNI4x32, <!-- @ref--> r123::AESNI1xm128i. <li> <!-- @ref AESNI--> "ARS" (Advanced Randomization System) is a **non-cryptographic** simplification of <!-- @ref AESNI--> "AESNI". See <!-- @ref--> r123::ARS4x32_R, <!-- @ref--> r123::ARS1xm128i_R. </ul> ## Installation and Testing The Random123 library is implemented entirely in header files. Thus, there is nothing to compile before using it and nothing to link after you `#include` it in your source files. Simply direct your C or C++ compiler to find the header files in the `include/` directory of the cloned repo and use the Random123 header files, types, and functions in your application. Users and packagers are **STRONGLY ADVISED** run `make check` to compile and run the tests in `tests/` before using Random123 in an application (see <!-- @ref TestsREADME--> "tests/README"). Do not use the library if any tests fail. (It is not a failure for a test to report that it cannot run because of missing hardware capabilities like 64bit multiply, SSE, AES-NI or compiler capabilities) The top-level GNUmakefile also has "install", "html", and "install-html" targets. The former will copy header files to \$(DESTDIR)\$(includedir) (default: /usr/local/include). The second will run doxygen, replacing anything in docs/html. The last will install the documentation in \$(DESTDIR)\$(docdir)/html (default: /usr/local/doc/Random123/html). ## Usage ### C++ API A typical C++ use case might look like: #include <Random123/philox.h> typedef r123::Philox4x32 RNG; RNG rng; RNG::ctr_type c={{}}; RNG::ukey_type uk={{}}; uk[0] = ???; // some user_supplied_seed RNG::key_type k=uk; for(...){ c[0] = ???; // some loop-dependent application variable c[1] = ???; // another loop-dependent application variable RNG::ctr_type r = rng(c, k); // use the random values in r for some operation related to // this iteration on objectid } On each iteration, `r` contains an array of 4 32-bit random values that will not be repeated by any other call to `rng` as long as `c` and `k` are not reused. In the example above, we use the <!-- @ref--> r123::Philox4x32, but any of the other <!-- @ref CBRNG--> "CBRNGs" would serve equally well. Also note that for most CBRNGs, the `ukey_type` and the `key_type` are identical; the code could just as well ignore the `ukey_type` and directly construct the `key_type`. However, for the <!-- @ref AESNI--> "AESNI" CBRNGs, the `key_type` is opaque, and must be constructed from a `ukey_type`, as shown. ### The C API In C, the example above could be written as: #include <Random123/philox.h> philox4x32_ctr_t c={{}}; philox4x32_ukey_t uk={{}}; uk.v[0] = user_supplied_seed; philox4x32_key_t k = philox4x32keyinit(uk); for(...){ c.v[0] = ???; /* some loop-dependent application variable */ c.v[1] = ???; /* another loop-dependent application variable */ philox4x32_ctr_t r = philox4x32(c, k); } In C, access to the contents of the counter and key is through the fixed-size array member `v`. ## The CUDA platform All relevant functions in the C and C++ APIs for Random123 are declared as CUDA device functions if they are included in a CUDA kernel source file and compiled with a CUDA compiler (nvcc). They can be used exactly as described/documented for regular C or C++ programs. It is now possible to use Random123 functions in both the host portion and the device portion of the same .cu source file. The Nx32 forms were faster than the Nx64 variants on 32-bit GPU architectures in 2011, but we haven't measured this recently. It has been reported that Random123 uses 16 bytes of static memory per thread. This is undesirable and not intentional, but we do not have a workaround other than to suggest adjusting memory allocation accordingly. The pi_cuda.cu and pi_cudapp.cu examples illustrate the use of CUDA. In a machine with different GPUs, the R123EXAMPLE_ENVCONF_CUDA_DEVICE environment variable can be set to a unique substring of the CUDA GPU device name to select a specific GPU (else examples try to choose the GPU with the most cores) ## The OpenCL platform The functions in the Random123 C API can all be used in OpenCL kernels, just as in regular C functions. As with CUDA, the Nx32 forms are faster than the Nx64 variants on current (2011) 32-bit GPU architectures. The `pi_opencl.c` and `pi_opencl_kernel.ocl` examples illustrate the use of OpenCL. In a machine with different OpenCL devices, the R123EXAMPLE_ENVCONF_OPENCL_DEVICE environment variable can be set to a unique substring of the OpenCL device name to select a specific OpenCL device (else examples try to choose the device with the most cores) ## C++11 \<random\> interface In addition to the stateless ("pure/functional") C++ API above, the Random123 package includes two C++ classes that leverage the C++11 \<random\> API. <ul> <li>r123::MicroURNG provides an adapter class that provides a more conventional interface compatible with the C++11 URNG (uniform random number generator) API; the MicroURNG adapter can be used with C++11 random number distributions and is fast/lightweight enough that a new MicroURNG can be instantiated with a unique key,counter tuple and used for each call to a distribution, there is little or no overhead to creating billions of unique MicroURNGs. This adapter retains one of the key advantages of CBRNGs -- complete application control over the RNG state. <li>r123::Engine provides the C++11 Random Engine API. This can also be used with any of the C++11 random distributions, but sacrifices the application control over RNG state that is a defining characteristic of CBRNGs. </ul> ## The GNU Scientific Library (GSL) interface In addition to the stateless ("pure/functional") C API above, the Random123 package includes two C adapter interfaces to the <a href="http://www.gnu.org/s/gsl/">GNU Scientific Library (GSL).</a> <ul> <li>The <!--\ref--> GSL_MICRORNG macro allows the application to define a GSL random number generator. It can be used with GSL random distributions but still provides the application with complete control over the RNG state (it is analogous to the MicroURNG class, in that it uses shorter periods, and is intended to be instantiated in large numbers for a few calls to the random distribution). <li>The <!--\ref--> GSL_CBRNG macro allows the application to create a GSL RNG with a completely conventional interface, sacrificing application control over the internal RNG state. </ul> ## Generating uniformly distributed and Gaussian distributed floats and doubles The Random123 library provides generators for uniformly distributed random **integers**. Often, applications want random **real** values or samples from other distributions. The general problem of generating samples from arbitrary distributions is beyond the scope of the Random123 library. One can, of course, use GSL or MicroURNG and the distributions in the C++11 \<random\> library, but a few simple cases are common enough that all that extra machinery seems like overkill. We have included a few generic conversion utilities which developers may find useful. <ul> <li> uniform.hpp - C++ functions that convert random integers to random, uniformly distributed floating point values. <li> u01fixedpt.h - C functions that convert random integers to random, uniformly distributed, equi-spaced, i.e., fixed point, values. <li> boxmuller.hpp - C++ functions that take two uniformly distributed integers (32 or 64 bit) and return a pair of Gaussian distributed floats or doubles. </ul> The Box-Muller method of generating Gaussian random variables is particularly well suited to Random123 because it deterministically consumes exactly two uniform randoms to generate exactly two gaussian randoms. It uses math library functions: sincos, log and sqrt which may be slow on some platforms, but which are surprisingly fast on others. Notably, on GPUs, the lack of branching in the Box-Muller method and hardware support for math functions overcomes the transcendental function overhead, making it the fastest generator of Gaussians that we are aware of. ## Examples The <!-- @ref ExamplesREADME--> "examples/" directory, contains example code intended to illustrate use of the library. Complete, short programs estimate pi by counting the number of random points that fall inside a circle inscribed in a square, demonstrating the C, C++, AES, GSL, OpenCL, CUDA and C++11 APIs. The environment variable R123EXAMPLE_ENVCONF_SEED can be set to any unsigned integer value to run the example with a different seed. Many of the pi_* examples run different numbers of iterations if that number is specified as the first argument on the command line. ### Tests and Benchmarks The <!-- @ref TestsREADME--> "tests/" directory contains tests and benchmarks. This code is complicated due to the fact that it is largely "single source" for CUDA, OpenCL and CPU implmentations. Developers are strongly discouraged from emulating its style. It contains: <ul> <li> Unit tests for individual components and "known-answer-tests", which should be run to ensure that these RNGs build correctly on desired platforms. These help to provide assurance that the code is being compiled correctly. <li> A variety of timing harnesses are provided which measure performance of a variety of generators in different programming environments. </ul> ## Portability Although we have done our best to make Random123 portable and standards conforming, it is an unfortunate fact that there is no portable code. There is only code that has been ported. Prior to release, we test Random123 on a variety of systems and with a variety of toolchains that are readily available to us. Our current test environment includes: <ul> <li> Linux, gcc-5.2.0, 6.3.0, 8.1.0, 10.1.0 using -march=native on Xeon hardware with AES and SSE4_2 and AVX2 support. <li> Linux, gcc-5.2.0 using -m32. <li> Linux, clang-8.0.0 with libc++ (8.0.0) on Xeon hardware. <li> Linux, CentOS7 using the vendor supplied gcc toolchain (4.8.5-16). <li> Linux, Ubuntu 16.04(LTS) using the vendor supplied gcc toolchain (5.4.0-6ubuntu1-16.04.11). <li> Linux, Ubuntu 16.04(LTS) using OpenCL beignet 1.1.1-2 and https://github.com/intel/compute-runtime/releases/tag/19.07.12410 <li> Linux, Ubuntu 18.04(LTS) with clang-11.0.1 and both libc++ and libstdc++. <li> Linux, icc-18.0.3 and 19.1.2.254 on Xeon hardware with AES, SSE4 and AVX2 support. <li> Linux, NVIDIA CUDA 10.0.130 with GTX 980 and 1080, and Titan RTX (aka Turing) hardware. <li> MacOS, with Xcode-10.1 and Metal on a 2018 Mac mini. </ul> In the past, we have tested Random123 with additional toolchains and hardware. Although we no longer test on these platforms, we know of no reason that they should not work. <ul> <li>Linux, gcc (multiple versions from 3.4.3 through 6.3.0), on x86_64. <li>Linux, gcc-4.1.2 and 4.4.1 on i686. <li>Linux, gcc-4.8 on ARMv7 (32bit) Freescale/NXP LS1021A & ARM A53 (64bit) Freescale/NXP 1043A. <li>Linux, clang-2.9, 3.0, 3.1, 3.3 and 3.6 on x86_64. <li>Linux, clang-3.0 and 3.1 with lib++ (2012-04-19 svn checkout) on x86_64. <li>Linux, clang-8.0.0 with libc++ on x86_64 <li>Linux, open64-4.2.4 on x86_64. <li>Linux, Intel icc and icpc 12.0.2 on x86_64. <li>Linux, NVIDIA CUDA 4.1.15, 4.2.6, 5.5.22 and 7.5.1. (NOTE: We recommend against the use of CUDA before 4.1) <li>Linux, OpenCL (NVIDIA SDK 4.0.17) on GTX480, M2090, GTX580 and GTX680 GPUs. <li>Linux, OpenCL (AMD APP SDK 2.4 or 2.5), on x86_64 CPUs and Radeon HD6970 GPUs. <li>Linux, OpenCL (Intel OpenCL 1.5), on x86_64 CPUs. <li>Solaris, both gcc-3.4.3 and Sun C/C++ 5.8, on x86_64. <li>FreeBSD 8.2, gcc-4.2.1, on x86_64. <li>MacOS X 5.8, gcc-4.0.1, on i686. <li>MacOS X 5.8, llvm-2.9.1 on i686 (problems with catching C++ exceptions). <li>Windows 7, Microsoft Visual Studio, version 10.0, Microsoft C/C++ compiler 16.00. </ul> Others have reported success on <ul> <li>MacOS, OpenCL on x86_64 CPUs <li>Linux, gcc-4.7.2 on Powerpc64 (BlueGene/Q) <li>Linux, Portland Group Compiler on Powerpc64 (BlueGene/Q) <li>Linux, IBM xlc on Powerpc64 (BlueGene/Q) <li>MacOS, Metal on x86_64 CPUs and AMD Radeon R9 M380 GPU <li>MacOS Sierra and Scientific Linux with Nvidia GPU and CUDA 8 <li>Linux, on s390x </ul> ## Warnings With some compilation options, the CUDA nvcc compiler warns about unreachable code in array.h. The compiler doesn't recognize that the code that is unreachable for some values of some macro parameters, is actually reachable for other values of the parameters. It is possible to disable that particular warning for a specific compilation unit by adding -Wcudafe --diag_suppress=111 to the compilation command line. On our ARMv7 test platform, we suspect a compiler bug with -O3, which does not seem to affect Random123 code itself, but produces nondeterministic results from time_serial. The offending compiler version was "aarch64-fsl-linux-gcc (Linaro GCC 4.8-2014.04) 4.8.3 20140401 (prerelease)". We slightly reordered a couple of innocuous statements (a timer() and dprintf() call) in time_serial to avoid the bug, but we would avoid -O3 on ARMv7 with that particular version of the compiler at least. ## Contributors We welcome bug reports, bug fixes, ports, general feedback and other enhancements to the [issues](https://github.com/DEShawResearch/random123/issues) and [pull requests](https://github.com/DEShawResearch/random123/pulls) pages of our [github repo](https://github.com/DEShawResearch/random123/). We are grateful for contributed bug-fixes and portability enhancements from the following users: <ul> <li> Geoffrey Irving and Gabriel Rockefeller - BlueGene/Q and powerpc ports <li> Yan Zhou - MacOS and clang ports <li> David Lawrie - allowing 64-bit philox to compile for both host and device with CUDA <li> Bogdan Opanchuk - pointing out the inconsistent rotation constants in the implementation of threefry2xW in version 1.07 and earlier. <li> Tom Schoonjans - Support for Metal (Apple's successor to OpenCL) <li> Karl Magdsick - documentation in uniform.hpp <li> KT Thompson - Visual Studio 2015 and ibm xlc compiler ports </ul> |
Commit History @lintian-fixes/main
0
»»
- routine-update: New upstream version Andreas Tille 2 years ago
- New upstream version 1.14.0+dfsg Andreas Tille 2 years ago
- Fix watchfile to detect new versions on github Andreas Tille 2 years ago
- Upload to unstable Andreas Tille 3 years ago
- dh_missing-friendly install Andreas Tille 3 years ago
- Build-Depends: doxygen Andreas Tille 3 years ago
- Set upstream metadata fields: Bug-Database, Bug-Submit. Andreas Tille 3 years ago
- Use secure URI in Homepage field. Andreas Tille 3 years ago
- Remove duplicate line from changelog. Andreas Tille 3 years ago
- routine-update: Rules-Requires-Root: no Andreas Tille 3 years ago
- routine-update: Add salsa-ci file Andreas Tille 3 years ago
- routine-update: debhelper-compat 13 Andreas Tille 3 years ago
- routine-update: Standards-Version: 4.5.1 Andreas Tille 3 years ago
- Update upstream source from tag 'upstream/1.14.0_rc2+dfsg' Andreas Tille 3 years ago
- New upstream version 1.14.0~rc2+dfsg Andreas Tille 3 years ago
- routine-update: New upstream version Andreas Tille 3 years ago
- Upstream moved code to Github Andreas Tille 3 years ago
- Upload to unstable Andreas Tille 5 years ago
- Standards-Version: 4.3.0 Andreas Tille 5 years ago
- Point Vcs fields to salsa.debian.org Andreas Tille 5 years ago
- debhelper 11 Andreas Tille 5 years ago
- Cleanup d/rules Andreas Tille 5 years ago
- Merge tag 'upstream/1.09+dfsg' Andreas Tille 7 years ago
- New upstream version 1.09+dfsg Andreas Tille 7 years ago
- Remove two unneeded files with unclear license Andreas Tille 7 years ago
- Fix long description Andreas Tille 7 years ago
- Upload to new Andreas Tille 7 years ago
- Initial packaging Andreas Tille 7 years ago
- New upstream version 1.09 Andreas Tille 7 years ago
0
»»