Skip to content
The Hari–Zimmermann generalized SVD for CUDA.
Cuda C++ Gnuplot Makefile Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


The Hari–Zimmermann generalized SVD for CUDA.

A part of the supplementary material for the paper arXiv:1909.00101 [math.NA].



A reasonably recent (e.g., 10.1) full CUDA installation on a 64-bit Linux or macOS is required.

For the Level 3 (multi-GPU) version an MPI installation on Linux built with the CUDA support (e.g., Open MPI) is required.

Then, clone and build JACSD repository, with the same parent directory as this one. In fact, only the jstrat library (i.e., libjstrat.a) is needed to be built there.

Make options

To build the test executable in double precision, do the following:

cd src

or, for double complex,

cd src

where SM is the target GPU architecture (e.g., for a Maxwell card it might be 52, for a Volta one 70, etc.), OPT is the optimization level (3 should be fine), and CVG is the algorithm requested (0, 1, 2, 3, 4, 5, 6, or 7).

It is also possible to append clean to the invocation above, to remove the executable, or such cleanup can be done manually.

For the Level 3 (multi-GPU) version, the prefix (e.g., /usr/local) of your MPI distribution has to be provided:

cd src
./ Z SM OPT CVG MPI=prefix

Please, adjust the compiling and linking flags in the makefile(s) for your particular MPI distribution, since the flags provided therein have been tailored for Open MPI!


Command line

To run the executable, say, e.g.

/path/to/HZ0.exe DEV SNP0 SNP1 ALG MF MG N FN

where DEV is the CUDA device number, SNP0 is the inner and SNP1 outer strategy ID (2 for cycwor or 4 for mmstep), ALG is 0 for full block or 8 for block-oriented, MF and MG are the number of rows of the first and the second matrix, respectively, N is the number of columns, and FN is the file name prefix (without an extension) containing the input data.

The Level 3 (multi-GPU) executables require a similar invocation:

/path/to/MHZ0.exe SNP0 SNP1 SNP2 ALG MF MG N FN

where SNP2 is the outermost strategy ID (3 for cycwor or 5 for mmstep; notice the increments), while the executable itself has to be run with at least two processes using mpiexec or a similar MPI job launcher.

Data format

Data should be contained in FN.Y and FN.W binary, Fortran-array-order files, where the first one stores the matrix F and the second one the matrix G, and both matrices are either double or double complex and are expected to have MF (first matrix) or MG (second matrix) rows and N columns.

The output comprises FN.YU, FN.WV, FN.Z, for the double or double complex matrices U (MF x N), V (MG x N), and Z (N x N); and FN.SY, FN.SW, FN.SS, for the double vectors \Sigma_F, \Sigma_G, and \Sigma, respectively, where all vectors are of length N.

See also FLAPWxHZ repository for more explanation.

This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).

You can’t perform that action at this time.