Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test error in "SVD Converge" in generic.py #69

Closed
Ali-Tehrani opened this issue Mar 8, 2021 · 10 comments
Closed

Test error in "SVD Converge" in generic.py #69

Ali-Tehrani opened this issue Mar 8, 2021 · 10 comments

Comments

@Ali-Tehrani
Copy link
Contributor

Ali-Tehrani commented Mar 8, 2021

During testing with github action, the follow test failed "test_generic_rectangular_translate_scale" for test_generic.py" with m = 829, n = 878 on python 3.7.

The error is
def _raise_linalgerror_svd_nonconvergence(err, flag): raise LinAlgError("SVD did not converge") E numpy.linalg.LinAlgError: SVD did not converge.

For more info on the github action, see https://github.com/Ali-Tehrani/procrustes/runs/2058961187.

However, running this test on my own machine it passes. So it seems the error is hardware related. Google searching this returns a 2019 SciPy issue [1]. I would recommend to decrease the matrix size so that it isn't hardware/MKL/OpenBlas dependent.

[1] - scipy/scipy#10032

@PaulWAyers
Copy link
Member

It may still have a problem, I guess, because matrices with rows/columns of zeros are evil..... It's a strange error (that we encountered once before in GOpt) because SVD is a matrix factorization and therefore is very robust. But the divide-and-conquer algorithm is not as robust (but it is significantly faster). I guess that using smaller matrices is a good compromise, but perhaps we should at least document this error message, which I think is not unlikely to appear when zero-padding is substantial because the size-mismatch of the matrices is large. The obvious solution is to directly access the SVD algorithm that is a factorizaton (instead of the default divide-and-conquer) algorithm. Derrick may remember how to do that. He also had a hack-ish way to fix the error.

@Ali-Tehrani Ali-Tehrani changed the title Test error in "SVD Converge" for python 3.7 Test error in "SVD Converge" Mar 8, 2021
@Ali-Tehrani Ali-Tehrani changed the title Test error in "SVD Converge" Test error in "SVD Converge" in generic.py Mar 8, 2021
@FarnazH
Copy link
Member

FarnazH commented Mar 9, 2021

These tests passed on Ubunto with python 3.8 before. Is GitHub Action using two different hardware for these runs?
We can decrease the matrix size and see whether it happens again.

The new runs failed again: https://github.com/theochem/procrustes/runs/2069045083#step:5:354

@PaulWAyers
Copy link
Member

The solution here, which may also be useful to @tczorro , is to use LAPACK directly. Construct the factorized SVD (more robust than the divide-and-conquer iterative algorithm), using (I think) scipy.linalg.lapack.gesvd though you may need to prefix the gesvd routine with dgesvd (float) or zgesvd (complex). Once you have the SVD (and this one should always work), you can then use

  1. Replace all elements of the sigma (singular value) diagonal that are greater than a threshhold with their inverse; replace values smaller than the threshold with zero. Call this inverse matrix Sigma-1, and its inverse .T (.H for Hermitian transpose).
  2. Then construct V {dot} Sigma.T {dot} U.H = pseudoinverse.

The traditional threshhold value for step 1 is, for an mxn input matrix A, max(m,n)*epsilon(elements of A). I.e., it is the machine-precision for the elements of A times the number of elements in the larger dimension. This could be approximated easily by merely using max(m,n)*1e-15, which is good enough.

This can be made (potentially much) more efficient by truncating the singular-value-inverse, U, and V to include only the elements corresponding to the nonzero rows/columns. That way the matrix multiplication is faster (because you are not computing a lot of "multiply something by zero" terms).

FarnazH added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 9, 2021
@FarnazH
Copy link
Member

FarnazH commented Mar 9, 2021

We can use the lapack_driver argument of https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html

FarnazH added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 9, 2021
@Ali-Tehrani
Copy link
Contributor Author

Ali-Tehrani commented Mar 9, 2021

The generic function uses Morse Penrose Inverse from numpy and it uses the divide and conquer method from Lapack (gesdd) to calculate the SVD, whereas SciPy's Morse Penrose inverse uses the general rectangular approach ('gesvd') [1]. So I think that we just need to change from numpy to scipy.

[1] - https://stackoverflow.com/questions/13265299/the-difference-of-pseudo-inverse-between-scipy-and-numpy

FarnazH added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 9, 2021
@PaulWAyers
Copy link
Member

PaulWAyers commented Mar 9, 2021

That's a good trick. @tczorro should look at it too. You have to fall down the rabbit-hole of documentation a bit to find these things sometimes :-( .

If this works, it may be worth remembering that if someone tried a VERY huge matrix, the speed/memory benefits of the pinv2 algorithm in scipy/numpy might help. But Numpy had a stupid tolerance. And if you have a huge matrix, the robustness of a "real SVD" instead of an "iterative SVD" is liekly to be important.....

FarnazH added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 9, 2021
@FarnazH
Copy link
Member

FarnazH commented Mar 9, 2021

I think scipy.linalg.pinv fixed the problem... While we are at this, should we use scipy.linalg.svd instead of numpy.linalg.svd (so we can set the lapack_driver='gesvd')?

@PaulWAyers
Copy link
Member

I would use scipy.linalg.svd with lapack_drive='gesvd' everywhere. Where that is not appropriate (where speed is critical) probably there are better choices than SVD (which is a bit of a brute-force approach inherently).

@FanwangM
Copy link
Collaborator

FanwangM commented Mar 9, 2021

Using scipy.linalg.pinv instead of numpy.linalg.pinv can also fix the problem of failing generic Procrustes, #61. I have tested that this idea can fix the issuse both locally and on GitHub as well. This can lead to a more robust implementation of generic Procrustes testing without setting a random seed. So we should do the resting without adding a random see.

Thanks for proposing to use scipy.linalg.pinv! @FarnazH @Ali-Tehrani

@FarnazH
Copy link
Member

FarnazH commented Mar 10, 2021

@Ali-Tehrani can you please take care of using scipy.linalg instead of numpy.linalg throughout the code? Thanks a lot.

Ali-Tehrani added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 10, 2021
Related to issue theochem#69, since scipy svd allows better control
of the type of SVD algorithm that is being used.
Ali-Tehrani added a commit to Ali-Tehrani/procrustes that referenced this issue Mar 10, 2021
Related to issue theochem#69, added to rotation, permutation
and symmetric.
@Ali-Tehrani Ali-Tehrani mentioned this issue Mar 10, 2021
@FarnazH FarnazH closed this as completed Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants