The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.
This software is a supplementary material for the paper doi:10.1142/S0129626420500152 (arXiv:2005.07403 [cs.MS]).
A recent Intel C compiler on a 64-bit Linux (e.g., CentOS 7.8) is required. The Intel MKL (Math Kernel Library) is recommended, but another LAPACK library could work with some tweaking.
Run make
in the src
subdirectory as follows:
make [COMPILER=x64x|x200|x64] [MARCH=...] [NDEBUG=optimization_level] [TEST=0..15] [all|clean|help]
where COMPILER
should be set to x64x
for Xeons, or to x200
for Xeon Phi KNLs, respectively.
Here, NDEBUG
should be set to the desired optimization level (3
is a sensible choice).
If unset, the predefined debug-mode build options will be used.
For testing, TEST=0
builds the vectorized code, and TEST=4
builds the pointwise code.
Adding two to TEST
enables the optional backscaling, while adding one enables the step-by-step printouts.
Adding eight to TEST
turns on tracking of IA32_MPERF
and IA32_APERF
MSRs (requires running the executables as root
).
For example, make COMPILER=x200 NDEBUG=3 clean all
will trigger a full, release-mode rebuild for the KNLs of the vectorized code only (equivalent to TEST=0
).
To write N
finite pseudorandom doubles into FileName
file, run:
./src/rndgen.exe N FileName
To test the real (or the complex, in the second line) algorithm T
, where T=TEST
, on N
vectors from FileName
, run:
./src/d8svd2tT.exe N FileName
./src/z8svd2tT.exe N FileName
To test the real (or the complex, in the second line) algorithm T
, where T=TEST
, on #batches
batches, each with n
matrices read from infile
, run:
./src/dbatchT.exe n #batches infile
./src/zbatchT.exe n #batches infile
For now, n
has to be a power of two (not a constraint on the algorithm itself, but only on the error testing procedure).
This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).