Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch INTERNAL ASSERTION error when performing SVD on large datasets #55

Open
JanisGeise opened this issue Mar 13, 2025 · 0 comments
Open

Comments

@JanisGeise
Copy link
Contributor

Hi @AndreWeiner,

when performing an SVD on a large dataset, I get the error message

RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1537, please report a bug to PyTorch. linalg.svd: Argument 12 has illegal value. Most certainly there is a bug in the implementation calling the backend library.

This seems to be a known issue and is related to:

and some other.

In summary, the reason for this error is the following:

  • PyTorch and SciPy both rely on LAPACK, which is compiled by default using 32-bit integer support. This can be verified by running e.g. scipy.__config__.show(), which yields in my case:

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas32/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.28 DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64
    pc file directory: /project
    version: 0.3.28
  • the max. number which can be represented is torch.iinfo(torch.int32).max: 2147483647
  • if the size of the data matrix exceeds this value, the SVD can't be computed, since the size of the data_matrix is represented by an int32

To verify the lacking support of 64-bit integer (here for SciPy), one can also execute scipy.linalg.get_lapack_funcs("gesdd", ilp64=True) which should yield the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/blas.py", line 401, in getter
    value = func(names, arrays, dtype, ilp64)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/janis/Daten/Promotion_TUD/Projects/sparseSpatialSampling/venv_scube/lib/python3.12/site-packages/scipy/linalg/lapack.py", line 992, in get_lapack_funcs
    raise RuntimeError("LAPACK ILP64 routine requested, but Scipy "
RuntimeError: LAPACK ILP64 routine requested, but Scipy compiled only with 32-bit BLAS

Potential Workaround

Numpy on the other hand is build supporting 64-bit integer by default, which can be verified by executing numpy.__config__.show() yielding:

Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/include
    lib directory: /opt/_internal/cpython-3.12.7/lib/python3.12/site-packages/scipy_openblas64/lib
    name: scipy-openblas
    openblas configuration: OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY
      Haswell MAX_THREADS=64
    pc file directory: /project/.openblas
    version: 0.3.27

(indicated by entry USE64BITINT).

Maybe we can replace the current implementation of the SVD class within flowtorch.data.svd with numpy.linalg.svd() to avoid this issue, or add some check of the data_matrix before computing the SVD.

Regards,
Janis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant