Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVD convergence error in the presence of nan's #3225

Closed
rgommers opened this issue May 8, 2021 · 7 comments
Closed

SVD convergence error in the presence of nan's #3225

rgommers opened this issue May 8, 2021 · 7 comments
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS

Comments

@rgommers
Copy link
Contributor

rgommers commented May 8, 2021

There seems to be a regression that came in after the 0.3.13 release. In 0.3.12 and 0.3.13, this works fine:

>>> a = np.array([[1, np.nan], [1, 1]])
>>> np.linalg.svd(a)

With 0.3.15 this does not converge. I cannot easily try 0.3.14 because it's not packaged on conda-forge. Users and packagers are observing this on multiple macOS and Linux systems, see numpy/numpy#18914 for details.

@martin-frbg
Copy link
Collaborator

Bisecting now, suspect it could be one of the recent merges from Reference-LAPACK 3.9.1

@martin-frbg
Copy link
Collaborator

martin-frbg commented May 8, 2021

Right, imported from Reference-LAPACK PR 471 "Handle norm NaN value in xGESDD":
Reference-LAPACK/lapack#471 which makes LAPACK return an error code rather than silently propagating the NaN like it "always" did (I notice now that the original issue ticket Reference-LAPACK/lapack#469 claimed that the code would "exit" or "crash" at that point, which is probably untrue unless it refers to the user's code).
As an aside this means that you would also see this when building against pure "netlib" LAPACK 3.9.1

@rgommers
Copy link
Contributor Author

rgommers commented May 8, 2021

Argh, so much fun. So not a bug from your perspective, and we should do something LAPACK-version-dependent in NumPy to handle this?

@martin-frbg
Copy link
Collaborator

Not sure if my (frog) perspective counts, I have raised the issue with the more illustrous members of the LAPACK team. Historically, NaN handling appears to have been something that LAPACK has not exactly excelled at. BTW the patch came from a matplotlib guy so at least it seems to stay in the family...

@martin-frbg martin-frbg added the Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS label May 8, 2021
@mattip
Copy link
Contributor

mattip commented May 9, 2021

which makes LAPACK return an error code rather than silently propagating the NaN like it "always" did

Is this now "return an error code and propagate the NaN" or "return an error code and not propagate the NaN"? If the latter, that would seem to break backward compatibility.

@martin-frbg
Copy link
Collaborator

Early return with error code set and arguments unchanged, which is what some other LAPACK routines already did.

@martin-frbg
Copy link
Collaborator

Closing here as discussion has taken place in the linked numpy ticket and my impression is that we're agreed to go forward with this change now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug in other software Compiler, Virtual Machine, etc. bug affecting OpenBLAS
Projects
None yet
Development

No branches or pull requests

3 participants