The Jacobi-type (hyperbolic) Singular Value Decomposition in CUDA (for 1 GPU).
This software is a supplementary material for the paper doi:10.1137/140952429.
The preprint is available at arXiv:1401.2720.
Multi-GPU level is still under cleanup and will be added eventually.
- 1 - Test program for 1 GPU,
- share - Common and device code,
- strat - Jacobi strategy tables.
Build strategy tables:
cd strat ./proc_all-lnx.sh (Linux), or ./proc_all-win.bat (Windows), or ./proc_all-mac.sh (MacOS).
Build the test program:
cd 1 ./mkLNX.sh SM D (Linux), or ./mkWIN.bat SM D (Windows), or ./mkMAC.sh SM D (MacOS), where
- SM = target GPU architecture (e.g., 30 or 35 or 37 for Kepler, 20 for Fermi),
- D = optimization level (usually 3).
Prerequisites for the test program:
- HDF5 library (please specify where it is installed by editing share/Makefile.XXX).
./x1.exe DEV SDY SNP0 SNP1 ALG H5F H5G [H5R]
- DEV = CUDA device number
- SDY = path to strat.so (Linux), or strat.dll (Windows), or strat.dylib (MacOS)
- SNP0 = BrentL or rowcyc or colcyc or cycwor or cycloc or mmstep strategy
- SNP1 = BrentL or rowcyc or colcyc or cycwor or cycloc or mmstep strategy
- ALG = algorithm number (see 1/HypJacL2.hpp)
- H5F = input HDF5 file (see main.cu)
- H5G = input/output HDF5 group
- H5R = optional output HDF5 file
Input file format
Here, an example of the input files can be found: http://euridika.math.hr:1846/Jacobi/data/
In essence, an HDF5 file can contain mulitple input matrices and the associated metadata, with each input in its own HDF5 group within the file.
Here is a Matlab code template to create an input dataset:
% G: input double precision M x N real matrix to compute (H)SVD of. % size(G) = [LDA N] % LDA >= M >= N >= NPLUS >= 0; % for 1 GPU: (LDA = M >= N) mod 64 == 0, % for 4 GPUs: (LDA = M >= N) mod 256 == 0, % N = NPLUS for SVD (J = I), % N > NPLUS for HSVD (NPLUS positive (+1) and N-NPLUS negative (-1) signs in J). IDADIM=int32([LDA; M; N; NPLUS]); % Substitute 'file' and 'group' with appropriate names. h5create('file.h5','/group/IDADIM',,'Datatype','int32') h5write('file.h5','/group/IDADIM',IDADIM) h5create('file.h5','/group/G',size(G)) h5write('file.h5','/group/G',G)
For Scilab it would be similar:
h5=h5open("file.h5","w") h5group(h5,"/group") IDADIM=int32([LDA;M;N;NPLUS]) h5write(h5,"/group/IDADIM",IDADIM) h5write(h5,"/group/G",G) h5close(h5)
file.h5 there will be a group named
group created, with two datasets:
IDADIM is an integer vector containing the dimensions of the matrix
G, which has
M rows and
N columns, and is stored in Fortran array order (column by column), with the leading dimension
For now, please, make sure that
group name is in fact the number of columns of
E.g., if the input matrix dataset has
2048 columns, the HDF5 group in which it is contained should be named
Hint: if the file is viewed with, e.g., HDFView,
G will look as if having
N rows and
M columns, which is perfectly normal, since the Fortran data look transposed in C, and vice versa.
Matlab and Scilab store and read data in the correct format for GPUJACHx.
For simplicity, it is expected that
LDA=M>=N, and that all dimensions are a multiple of
64. That is not a big issue, though, since a matrix can always be "bordered".
The bordering can be done in, e.g., Matlab or Scilab, as follows (
I is an identity matrix):
The output, if requested, is stored in the same-named
G being the matrix of the left singular vectors,
V the matrix of the right singular vectors (non-transposed), and
D are the eigenvalues of
trans(G)*J*G, i.e., the squares of the singular values (for the "ordinary" SVD, with
Note that the strategies do not support all possible input dimensions!
Please have a look at the strategies' headers in
It is expected that the fastest execution will be with cycwor.
Example, for the full SVD with the block oriented variant and output in
./x1.exe 0 ../strat/strat.dylib cycwor cycwor 9 input.h5 group output.h5
This work has been supported in part by Croatian Science Foundation under the project IP-2014-09-3670 (MFBDA).