Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suprisingly high memory requirements in LU factorization #50

Open
MonkeysAreEvil opened this issue Jul 6, 2021 · 8 comments
Open

Suprisingly high memory requirements in LU factorization #50

MonkeysAreEvil opened this issue Jul 6, 2021 · 8 comments

Comments

@MonkeysAreEvil
Copy link

Hi,

I'm using armadillo to do eigendecomposition on some sparse matrices. armadillo uses SuperLU and ARPACK for this.
In particular, for the matrices I'm interested in, armadillo uses zgstrf, zgstrs, and ZNAUPD (ARPACK). I'm hitting 'Not enough memory' errors for what I would consider not especially large matrices. In particular, a small increase in matrix size requires a huge increase in memory usage.

Tracing the execution, the error happens in zgstrf.c:255. I can generate random matrices of the same size and around the same density and reliably hit the same problem. I'm guessing the fill-in is somehow related, but I don't really understand the algorithms involved.

For example, for a (2x27x666)x(2x27x666) matrix I can diagonalise comfortably with <32G of memory. Jumping to (2x28x666)x(2x28x666) requires >256G of memory; with 256G of memory I hit a 'Not enough memory' error, from zgstrf. I have access to more memory but it seems pointless to push until I know why the memory usage suddenly explodes for this small size increase.

I'm not sure if this is actually a bug, or known behaviour. I'm open to suggestions on what's going on.

Notes:

  • I'm using Hermitian matrices
  • The density is around 0.05
  • The rcond is around 1e-5
  • SuperLU v5.2
@xiaoyeli
Copy link
Owner

This doesn't make sense. Can you send more diagnostic info? Number of nonzeros in the matrix? What does the output look like in the successful execution of 2x27x66 ? Can you recompile the code with PRNTlevel=1, that prints more info.
Also, since your matrix is Hermitian, you can use "symmetric mode", as explained in this FAQ page:
https://portal.nersc.gov/project/sparse/superlu/faq.html#sym-problem

@MonkeysAreEvil
Copy link
Author

MonkeysAreEvil commented Jul 19, 2021

Number of nonzeros: 73586988
Output with PRNTlevel=1:

.. parameters in sp_ienv():
         1: panel size     20 
         2: relax          10 
         3: max. super    200 
         4: row-dim 2D    200 
         5: col-dim 2D    100 
         6: fill ratio     30 
.. Use approximate minimum degree column ordering.
zLUMemInit() called: fill_ratio 30, nzlmax 551902410, nzumax 2147483648
Not enough memory to perform factorization.

Enabling symmetric mode gives the exact same error.

For a smaller matrix e.g. 28666 I get

.. parameters in sp_ienv():
         1: panel size     20 
         2: relax          10 
         3: max. super    200 
         4: row-dim 2D    200 
         5: col-dim 2D    100 
         6: fill ratio     30 
.. Use approximate minimum degree column ordering.
zLUMemInit() called: fill_ratio 30, nzlmax 139211610, nzumax 556846440

and then proceeds to run without issue.

Actually, on closer reading of the armadillo docs I can actually directly diagonalise around smallest magnitude with arpack ("sm" mode), rather than doing a shift with superlu. Unfortunately, "sm" is significantly slower than arpack+superlu, and doesn't reliably converge (i.e. I have to play with number of eigenvalues to return and max number of iterations, which is annoying because ahead of time there's no way to know what values will work). So there's still value in figuring out what's going on here.

thanks!

@MonkeysAreEvil
Copy link
Author

MonkeysAreEvil commented Sep 16, 2021

I believe I've figured this out. If I look closely at one of my problematic matrices I find that nzlumax = -2147483648 which is precisely the integer overflow from int nzlumax = fill_ratio * annz for fill_ratio = 30, annz = 73586988 (for the sake of other readers, annz is the number of nonzero elements in the matrix). There's a check in zmemory.c:267 if nzlumax < annz. Obviously if nzlumax overflows this check will be triggered and an erroneous "Not enough memory" error reported.

The numbers given here are obviously unique to my matrices, but problematic values arise for various combinations of matrix size and densities, which is why I can reproduce this issue for random matrices. Actually, this is probably the same issue as in #24 .

The easy solution is to upgrade nzlumax to a uint64_t or larger type. This can be achieved e.g. by the attached patch, fix_zmemoryinit_integer_overflow.patch.txt. I suspect all gstrf functions need to be fixed, and possibly others too. Also, this approach might have portability issues.

There may be a more robust approach. For example, we could add a -DLONG_INT compile flag which controls a typedef like typedef long int int; vs typedef long long int int;. Possibly this may break other things, so it's not obvious to me in general what the best approach is.

I'm happy to do a pull request with a more complete fix, based on above or similar.

cheers

@xiaoyeli
Copy link
Owner

That is a good diagnosis. Thanks for sharing this.
Serial SuperLU was developed long time ago, at that time, we didn't foresee the code would be used to solve such a large systems to cause int32 overflow.
In the newer codes: superlu_mt (multithreaded) and superlu_dist (distributed memory), we use a new int type: int_t, which can be defined as int or long long int.

This int_t approach is not implemented in serial superlu. It may take some effort to do this, because this needs to be done for all the compressed sparse storages, for both matrix A and and {L,U} (particularly needed). In their meta data structures, whenever 'number of nonzeros' is involved, the variables need to be 'int_t'.

Perhaps you can take a look at superlu_mt code.

@MonkeysAreEvil
Copy link
Author

MonkeysAreEvil commented Sep 21, 2021

Thanks for the suggestions. I've started working on this, but it will take some time to get working properly. int_t is already used most everywhere sensible, and it's straightforward to add -Denable_longint to the CMakeLists. Updating *memory and *gstrtf functions has also been straightforward. Unfortunately, a lot of utility/plumbing code implicitly uses int32 so I'm stuck in a lot of fiddly work. In particular I'm seeing some strange memory and performance issues so it will take a little while to all sort out.

Also, armadillo doesn't define int_t quite correctly so I'll have to coordinate with them; it may be that SciPy and other projects have a similar issue but I haven't checked.

@xiaoyeli
Copy link
Owner

xiaoyeli commented Sep 21, 2021 via email

@WarrenWeckesser
Copy link

... it may be that SciPy and other projects have a similar issue but I haven't checked.

Yes, see scipy/scipy#15688

@gruenich
Copy link
Contributor

gruenich commented Apr 7, 2023

In recent days, @xiaoyeli introduces support for 64 bit indexing (44f2c5e, 91276c9, 8fea5b0, 32cba61). Does this resolve this issue? Can you check and provide feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants