Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRunning out of memory in inverse and shift mode on sparse matrices #10
Comments
|
I was also unable to use eigs_sym(), I was getting "matrix too large" errors in cholmod_dense.c. |
|
Hi @swajnautcz |
|
I'm using the CRAN pre-build stable version for Windows (rARPACK_0.7-0.zip on https://cran.r-project.org/web/packages/rARPACK/index.html). Is there a difference between those? |
|
The development version is 0.8-0, and you may want to give it a try. A pre-built package can be found at http://win-builder.r-project.org/r0kJrdtI7NyO/ However, the problem might still happen, given the difficulty of the computation. To use the shift-and-invert mode, we need to factorize a large matrix, whose storage is generally not sparse and hence may be out of memory. Another aspect is that if the matrix is singular, then it will for sure run into error. Usually for large matrices we are only interested in calculating the largest eigenvalues. Is there any specific reason why you also want the smallest ones? |
|
The matrix is positive definite (and hence invertible). I'm doing some analysis of least squares parameterizations, and I'm looking for the condition numbers of the matrices. One way to get a condition number is by taking a ratio of the eigenvalues. I will give the 0.8-0 a try and let you know. |
|
The matrix in question can be downloaded at https://sourceforge.net/projects/slamfrontend/files/data/eigenvalues/ in case you want to try. |
|
Ok, so admittedly, the 0.8-0 did calculate the solution:
and the results do match the Matlab results:
However, to do so, it needed over 15 GB of RAM and about 2 hours runtime. What kind of matrix factorization is being done in this case? My code factorizes the matrix in a second, solving is instant. I'm quite sure that either I'm calling the code from R inappropriately and somewhere a dense decomposition is called, or the decomposition is being calculated without fill-reducing ordering. |
|
Thanks for the test. Since this package is using Eigen (http://eigen.tuxfamily.org/) for linear algebra computation, the factorization I'm using for sparse matrix is an implementation of SuperLU (http://eigen.tuxfamily.org/dox/classEigen_1_1SparseLU.html). This can be improved since we can use sparse LDL to optimize for symmetric matrices. Eigen also provides other linear solvers including iterative methods (http://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html). I have little experience on large sparse linear solver, so could you give me some hints on which solver I should use? Seeing that you could factorize the matrix at this scale in a second, I'm eager to learn what technique you are using. Thank you! |
|
Oh, I see. The Eigen's sparse LU should be fine, unless something fishy is going on. I'm currently using sparse LLT factorization, as my matrix is positive definite, but that should be only about 1/3 faster than the LU and use about 1/2 of memory, so the LU should still be very reasonable, yet versatile. For non-square problems I would use sparse QR factorization. All these methods require a good ordering in order to run well (in Eigen that's the analyzePattern() function). Iterative solving on the other hand is slightly less precise, and I believe it is slightly faster only if you solve once. If you solve many times, then the costly factorization is amortized by much simpler solving which degrades to backsubstitution (similar cost as matrix vector multiply, usually two of these are needed). Could you point me to the code where you calculate the inverse of a sparse matrix? I started using the C++ code directly, rather than through R, and was able to obtain more than reasonable results with this:
|
|
The code is at https://github.com/yixuan/rARPACK/blob/master/inst/include/RMatOp/RealShift_sparseMatrix.h I tried to test your data file, but it seems that the link is broken. Is there any other access to the file? I do want to investigate the weird behavior of sparse LU. Also, if you prefer C++ code, you may also take a look at the eigen solver (https://github.com/yixuan/arpack-eigen) that the R interface is based on. |
|
Yes, so the problem appears to be that you are not calling solver.analyzePattern() before factorizing the matrix. The improved code would be:
If you do not call analyzePattern(), it will calculate the factorization with the natural ordering, which may introduce considerable fill-in (more nonzero entries in the sparse matrix) which causes both more memory use and also longer runtimes for both the factorization and solving. See e.g. the example section in http://au.mathworks.com/help/matlab/ref/colamd.html or http://au.mathworks.com/help/matlab/ref/symamd.html. Notice how dense the LU gets without the ordering. As for the link, http://sourceforge.net/projects/slamfrontend/files/data/eigenvalues/ should now work. Appologies for that, there was a permission problem. You can verify that there is a large difference in runtime with / without calling the .analyzePattern(). |
|
It's very nice of you for the careful explanation! Actually I notice that the Later I'll try the sparse Choleskey decomposition, to see if it could give better performance. |
|
Ah, I see, I have never used that one before so I did not know that. I'll try to use it in my code then and will let you know what the result is. |
|
I'd also like to test the Choleskey solver on this data set, but I saw errors when I tried to extract it. Could you double check the validity of the zip file? Thanks. |
|
I see, the problems never end :). Sorry, I'm on a vacation in Australia and the connection here is just rubbish, the upload probably dropped the first time around without me noticing. I re-uploaded the file. It should indeed have been 115 MB instead of 50 MB. I have tried using your code and indeed the Eigen sparse LU wrapper seems to be eating loads of memory. Investigating further, I have used CSparse to get the LU decomposition as well (since Eigen does not allow access to the L and U matrices in the sparse LU class). While the Cholesky factorization is quite sparse (10 M nonzeroes), the LU factorization sparsity also depends on the pivoting. Apparently, my matrix is not very well conditioned and the pivoting changes the ordering quite a lot (220 M nonzeroes), introducing many nonzeroes. I'm not sure if using sparse LDLT would make a difference here. |
|
I don't have the complete results yet, but as it stands, Cholesky and LDLT are both fast on the given matrix, while LU and QR are slow and need a lot of memory. Not a fault of the implementation, rather the fault of the matrix. |
|
You are correct.
This has significant improvement over the previous version. A lot thanks to you! |
|
Great, that's a significant improvement indeed. There is one more issue with my script, which is more on the R side I suppose. When trying to call
Then depending on the combination of
or:
or the infamous Windows message that "process R stopped working". I know that this probably does not concern the package implementation so much, but still - do you have an idea how to call |
|
Also, not sure if you have noticed, the
it is possible to:
which will probably be even slightly faster and require less memory. |
|
For now And |
|
Thanks for pointing out. Actually I have already adopted this usage. :-) |
|
Right, you have just proved my poor knowledge of R. I didn't realize I'm explicitly asking for a dense matrix, thought that 'd' stands for 'double' there. Thanks a lot for your help and good luck developing the package further. |
|
Also thanks for your enormous kind help. I'll let you know once I finish the extension of |
|
I've added the support for library(Matrix)
f <- file("file://./lambda.mtx", "r")
str(m <- readMM(f))
close(f)
# read sparse matrix
A <- sparseMatrix(m@i, m@j, index1 = FALSE, x = m@x, giveCsparse = TRUE)
library(rARPACK)
system.time(le <- eigs_sym(A, k = 5, which = "LM", opts = list(retvec = FALSE)))
## user system elapsed
## 1.942 0.009 1.949
system.time(le <- eigs_sym(A, k = 5, sigma = 0, opts = list(retvec = FALSE)))
## user system elapsed
## 3.921 0.347 4.264 |

Hello,
I am trying to get the Eigenvalues of a large sparse matrix in R. For the large eigenvalues, I have no problems but for the small Eigenvalues, the "SM" does not converge even if I increase the number of iterations and tolerance. If I use sigma = 0, then I get a bad_alloc in a while. The matrix in question is real symmetric 174515 x 174515, with 9363966 nonzeroes and has just over 300 MB in the mtx (text format). I'm running 64-bit version of R and there is 16 GB of RAM in the box, I see little reason why this should happen. Obviously I'm doing something wrong. My R code is as follows:
I have never written anything in R, maybe that's the issue. Any insights?