Skip to content

Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of 0.6 cycles per 32bit integer.

License

Notifications You must be signed in to change notification settings

tugrul512bit/oofrng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

oofrng

Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of:

uint32_t between [0 and 2^32-1)

  • 0.53 cycles per 32bit integer for Xeon Gold 5215 2.5GHz (1 thread, AVX512).
    • (with -O3 -march=native -mavx512f -ffast-math -fno-math-errno compiler flags used)
  • 1.28 cycles per 32bit integer for Fx8150 (1 core/1 module, AVX)

uint32_t between [0 and non-2^k)

  • 2.1 cycles per 32bit integer for Xeon Gold 5215 2.5GHz (1 thread, AVX512).
    • (with -O3 -march=native -mavx512f -ffast-math -fno-math-errno compiler flags used)
  • 4.5 cycles per 32bit integer for Fx8150 (1 core/1 module, AVX)

uint32_t between [0 and 2^k)

  • 0.76 cycles per 32bit integer for Xeon Gold 5215 2.5GHz (1 thread, AVX512).
    • (with -O3 -march=native -mavx512f -ffast-math -fno-math-errno compiler flags used)
  • 1.8 cycles per 32bit integer for Fx8150 (1 core/1 module, AVX)

float between [0 and 1)

  • 1.12 cycles per 32bit float for Xeon Gold 5215 2.5GHz (1 thread, AVX512).
  • 3 cycles per 32bit float for Fx8150 (1core/1module, AVX)

float between [0 and k)

  • 1.12 cycles per 32bit float for Xeon Gold 5215 2.5GHz (1 thread, AVX512).
  • 3 cycles per 32bit float for Fx8150 (1core/1module, AVX)
constexpr int n = 1024*16;

// 64 is the internal width of vectorization 
// (can be set to power of 2 greater than or equal to 2)
oofrng::Generator<64> gen;

// to help compiler use aligned vector instructions
alignas(4096)
uint32_t r[n]; // float is supported too

// 3409 nanoseconds to fill n-element array with random numbers 
// (or 4.8 Giga-integers per second, on Xeon Gold 5215)
gen.generate(r,n); 

// fill n elements again, but with upper limit (not inclusive)
gen.generate(r,n,3.14f);

About

Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of 0.6 cycles per 32bit integer.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages