The Hari–Zimmermann generalized SVD for CUDA.
A part of the supplementary material for the paper arXiv:1909.00101 [math.NA].
A reasonably recent (e.g., 10.1) full CUDA installation on a 64-bit Linux or macOS is required.
For the Level 3 (multi-GPU) version an MPI installation on Linux built with the CUDA support (e.g., Open MPI) is required.
Then, clone and build JACSD repository, with the same parent directory as this one. In fact, only the
jstrat library (i.e.,
libjstrat.a) is needed to be built there.
To build the test executable in
double precision, do the following:
cd src ./mk.sh D SM OPT CVG
cd src ./mk.sh Z SM OPT CVG
SM is the target GPU architecture (e.g., for a Maxwell card it might be
52, for a Volta one
OPT is the optimization level (
3 should be fine), and
CVG is the algorithm requested (
It is also possible to append
clean to the invocation above, to remove the executable, or such cleanup can be done manually.
For the Level 3 (multi-GPU) version, the
/usr/local) of your MPI distribution has to be provided:
cd src ./mk.sh Z SM OPT CVG MPI=prefix
Please, adjust the compiling and linking flags in the makefile(s) for your particular MPI distribution, since the flags provided therein have been tailored for Open MPI!
To run the executable, say, e.g.
/path/to/HZ0.exe DEV SNP0 SNP1 ALG MF MG N FN
DEV is the CUDA device number,
SNP0 is the inner and
SNP1 outer strategy ID (
0 for full block or
8 for block-oriented,
MG are the number of rows of the first and the second matrix, respectively,
N is the number of columns, and
FN is the file name prefix (without an extension) containing the input data.
The Level 3 (multi-GPU) executables require a similar invocation:
/path/to/MHZ0.exe SNP0 SNP1 SNP2 ALG MF MG N FN
SNP2 is the outermost strategy ID (
mmstep; notice the increments), while the executable itself has to be run with at least two processes using
mpiexec or a similar MPI job launcher.
Data should be contained in
FN.W binary, Fortran-array-order files, where the first one stores the matrix
F and the second one the matrix
G, and both matrices are either
double complex and are expected to have
MF (first matrix) or
MG (second matrix) rows and
The output comprises
FN.Z, for the
double complex matrices
MF x N),
MG x N), and
N x N); and
FN.SS, for the
\Sigma, respectively, where all vectors are of length
See also FLAPWxHZ repository for more explanation.
This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).