Suprisingly high memory requirements in LU factorization #50

MonkeysAreEvil · 2021-07-06T00:29:48Z

Hi,

I'm using armadillo to do eigendecomposition on some sparse matrices. armadillo uses SuperLU and ARPACK for this.
In particular, for the matrices I'm interested in, armadillo uses zgstrf, zgstrs, and ZNAUPD (ARPACK). I'm hitting 'Not enough memory' errors for what I would consider not especially large matrices. In particular, a small increase in matrix size requires a huge increase in memory usage.

Tracing the execution, the error happens in zgstrf.c:255. I can generate random matrices of the same size and around the same density and reliably hit the same problem. I'm guessing the fill-in is somehow related, but I don't really understand the algorithms involved.

For example, for a (2x27x666)x(2x27x666) matrix I can diagonalise comfortably with <32G of memory. Jumping to (2x28x666)x(2x28x666) requires >256G of memory; with 256G of memory I hit a 'Not enough memory' error, from zgstrf. I have access to more memory but it seems pointless to push until I know why the memory usage suddenly explodes for this small size increase.

I'm not sure if this is actually a bug, or known behaviour. I'm open to suggestions on what's going on.

Notes:

I'm using Hermitian matrices
The density is around 0.05
The rcond is around 1e-5
SuperLU v5.2

xiaoyeli · 2021-07-13T00:56:52Z

This doesn't make sense. Can you send more diagnostic info? Number of nonzeros in the matrix? What does the output look like in the successful execution of 2x27x66 ? Can you recompile the code with PRNTlevel=1, that prints more info.
Also, since your matrix is Hermitian, you can use "symmetric mode", as explained in this FAQ page:
https://portal.nersc.gov/project/sparse/superlu/faq.html#sym-problem

MonkeysAreEvil · 2021-07-19T06:21:11Z

Number of nonzeros: 73586988
Output with PRNTlevel=1:

.. parameters in sp_ienv():
         1: panel size     20 
         2: relax          10 
         3: max. super    200 
         4: row-dim 2D    200 
         5: col-dim 2D    100 
         6: fill ratio     30 
.. Use approximate minimum degree column ordering.
zLUMemInit() called: fill_ratio 30, nzlmax 551902410, nzumax 2147483648
Not enough memory to perform factorization.

Enabling symmetric mode gives the exact same error.

For a smaller matrix e.g. 28666 I get

.. parameters in sp_ienv():
         1: panel size     20 
         2: relax          10 
         3: max. super    200 
         4: row-dim 2D    200 
         5: col-dim 2D    100 
         6: fill ratio     30 
.. Use approximate minimum degree column ordering.
zLUMemInit() called: fill_ratio 30, nzlmax 139211610, nzumax 556846440

and then proceeds to run without issue.

Actually, on closer reading of the armadillo docs I can actually directly diagonalise around smallest magnitude with arpack ("sm" mode), rather than doing a shift with superlu. Unfortunately, "sm" is significantly slower than arpack+superlu, and doesn't reliably converge (i.e. I have to play with number of eigenvalues to return and max number of iterations, which is annoying because ahead of time there's no way to know what values will work). So there's still value in figuring out what's going on here.

thanks!

MonkeysAreEvil · 2021-09-16T04:01:24Z

I believe I've figured this out. If I look closely at one of my problematic matrices I find that nzlumax = -2147483648 which is precisely the integer overflow from int nzlumax = fill_ratio * annz for fill_ratio = 30, annz = 73586988 (for the sake of other readers, annz is the number of nonzero elements in the matrix). There's a check in zmemory.c:267 if nzlumax < annz. Obviously if nzlumax overflows this check will be triggered and an erroneous "Not enough memory" error reported.

The numbers given here are obviously unique to my matrices, but problematic values arise for various combinations of matrix size and densities, which is why I can reproduce this issue for random matrices. Actually, this is probably the same issue as in #24 .

The easy solution is to upgrade nzlumax to a uint64_t or larger type. This can be achieved e.g. by the attached patch, fix_zmemoryinit_integer_overflow.patch.txt. I suspect all gstrf functions need to be fixed, and possibly others too. Also, this approach might have portability issues.

There may be a more robust approach. For example, we could add a -DLONG_INT compile flag which controls a typedef like typedef long int int; vs typedef long long int int;. Possibly this may break other things, so it's not obvious to me in general what the best approach is.

I'm happy to do a pull request with a more complete fix, based on above or similar.

cheers

xiaoyeli · 2021-09-16T04:21:46Z

That is a good diagnosis. Thanks for sharing this.
Serial SuperLU was developed long time ago, at that time, we didn't foresee the code would be used to solve such a large systems to cause int32 overflow.
In the newer codes: superlu_mt (multithreaded) and superlu_dist (distributed memory), we use a new int type: int_t, which can be defined as int or long long int.

This int_t approach is not implemented in serial superlu. It may take some effort to do this, because this needs to be done for all the compressed sparse storages, for both matrix A and and {L,U} (particularly needed). In their meta data structures, whenever 'number of nonzeros' is involved, the variables need to be 'int_t'.

Perhaps you can take a look at superlu_mt code.

MonkeysAreEvil · 2021-09-21T06:51:15Z

Thanks for the suggestions. I've started working on this, but it will take some time to get working properly. int_t is already used most everywhere sensible, and it's straightforward to add -Denable_longint to the CMakeLists. Updating *memory and *gstrtf functions has also been straightforward. Unfortunately, a lot of utility/plumbing code implicitly uses int32 so I'm stuck in a lot of fiddly work. In particular I'm seeing some strange memory and performance issues so it will take a little while to all sort out.

Also, armadillo doesn't define int_t quite correctly so I'll have to coordinate with them; it may be that SciPy and other projects have a similar issue but I haven't checked.

xiaoyeli · 2021-09-21T16:55:09Z

Thanks for the update. I think SuperLU_MT code is a good reference to look at. (Or SuperLU_DIST code) Sherry

…

On Mon, Sep 20, 2021 at 11:51 PM MonkeysAreEvil ***@***.***> wrote: Thanks for the suggestions. I've started working on this, but it will take some time to get working properly. int_t is already used most everywhere sensible, and it's straightforward to add -Denable_longint to the CMakeLists. Updating *memory and *gstrtf functions has also been straightforward. Unfortunately, a lot of utility/plumbing code implicitly uses int32 so I'm stuck in a lot of fiddly work. In particular I'm seeing some strange memory and performance issues so it will take a little while to all sort out. Also, armadillo doesn't define int_t quite correctly so I'll have to coordinate with them; it may be that SciPy and other projects have a similar issue but I haven't checked. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#50 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACZSV53VZV4EVAGFHLNZOMLUDATO3ANCNFSM473Q2PGA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

WarrenWeckesser · 2023-03-09T19:15:28Z

... it may be that SciPy and other projects have a similar issue but I haven't checked.

Yes, see scipy/scipy#15688

gruenich · 2023-04-07T20:53:49Z

In recent days, @xiaoyeli introduces support for 64 bit indexing (44f2c5e, 91276c9, 8fea5b0, 32cba61). Does this resolve this issue? Can you check and provide feedback?

MonkeysAreEvil mentioned this issue Oct 5, 2021

Add support for 64-bit ints. #54

Closed

WarrenWeckesser mentioned this issue Mar 9, 2023

BUG: Memory Error in scipy.sparse.splu scipy/scipy#15688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suprisingly high memory requirements in LU factorization #50

Suprisingly high memory requirements in LU factorization #50

MonkeysAreEvil commented Jul 6, 2021

xiaoyeli commented Jul 13, 2021

MonkeysAreEvil commented Jul 19, 2021 •

edited

Loading

MonkeysAreEvil commented Sep 16, 2021 •

edited

Loading

xiaoyeli commented Sep 16, 2021

MonkeysAreEvil commented Sep 21, 2021 •

edited

Loading

xiaoyeli commented Sep 21, 2021 via email

WarrenWeckesser commented Mar 9, 2023

gruenich commented Apr 7, 2023

Suprisingly high memory requirements in LU factorization #50

Suprisingly high memory requirements in LU factorization #50

Comments

MonkeysAreEvil commented Jul 6, 2021

xiaoyeli commented Jul 13, 2021

MonkeysAreEvil commented Jul 19, 2021 • edited Loading

MonkeysAreEvil commented Sep 16, 2021 • edited Loading

xiaoyeli commented Sep 16, 2021

MonkeysAreEvil commented Sep 21, 2021 • edited Loading

xiaoyeli commented Sep 21, 2021 via email

WarrenWeckesser commented Mar 9, 2023

gruenich commented Apr 7, 2023

MonkeysAreEvil commented Jul 19, 2021 •

edited

Loading

MonkeysAreEvil commented Sep 16, 2021 •

edited

Loading

MonkeysAreEvil commented Sep 21, 2021 •

edited

Loading