A fast, testless MRG32k3a implementation
This is a fast, testless implementation based on 64-bit integers of Pierre L'Ecuyer's pseudorandom number generator MRG32k3a.
There are three tests in the standard implementation of MRG32k3a: one test to correct the combined output, and two tests to correct negative modular residuals. In this implementation, the first test is avoided by arithmetization, and the other two by ensuring that the argument to the modulo operator is nonnegative, which is possible because of the small size of the numbers involved.
Additionally, the output value is computed on the current state, rather then on the next state, to give the processor a chance to parallelize internally the computation of the output value and of the next state.
On an Intel® Core™ i7-7700 CPU @3.60GHz, the testless implementation is roughly four times faster than the double-based implementation and two times faster than the trivial 64-bit integer implementation. Please see the comments in the code for more precise data. Note that nothing can beat, for batch generation, the Intel® Math Kernel Library, which uses vectorized instructions.
We provide a C implementation and a Java implementation. The Java documentation can be generated with
javadoc -d docs MRG32k3a.java
The stream generated by this implementation is identical to that of the original one, provided that the initial state is set using the provided methods. Should you manipulate directly the state, you must remember that since we emit a result using the current state, you need to generate and throw away one output to align the results with L'Ecuyer original implementation.
For convenience, we provide a (uniform across implementations) seeding method which uses a 64-bit bit integer and an underlying SplitMix64 generator.