Skip to content

Bug: GCXS slicing can take inordinate amount of time / crash kernel #853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
dani-corie opened this issue Mar 23, 2025 · 10 comments
Open
3 tasks done
Labels
bug Indicates an unexpected problem or unintended behavior need more info Indicates that an issue, pull request, or discussion needs more information upstream

Comments

@dani-corie
Copy link

sparse version checks

  • I checked that this issue has not been reported before list of issues.

  • I have confirmed this bug exists on the latest version of sparse.

  • I have confirmed this bug exists on the main branch of sparse.

Describe the bug

I have noticed that under some circumstances, slicing a GCXS array will take an unexpecteda amount of time, or crash the kernel after a runtime of >45 seconds.

Steps or code to reproduce the bug

a = sparse.COO(
  [[1, 100, 215, 66],[5, 101, 242, 11],[3, 5, 1, 11],[13, 1, 3, 1],[55, 1, 6, 8]],
  [5, 10, 2, 1],
  shape=(255, 255, 255, 255, 255)
)
b = a.asformat('gcxs')
b[1, :, :, :, :]

Expected results

A slice returned within a short amount of time, provided the sparsity of the matrix.

b[:, 1, :, :, :], b[:, :, 1, :, :] and b[:, :, :, 1, :] return after a fraction of a second.
b[: :, :, :, 1] for some reason takes ~4 seconds on my machine.

Actual results

Kernel crash after >45 seconds of runtime.

Please describe your system.

  1. OS and version: Ubuntu 22.04, latest update
  2. sparse version: '0.16.0b4'
  3. NumPy version: '2.1.3'
  4. Numba version: '0.61.0'

Relevant log output

The Kernel crashed while executing code in the current cell or a previous cell. 
Please review the code in the cell(s) to identify a possible cause of the failure. 
Click here for more info. 
View Jupyter log for further details.



19:39:49.903 [info] Restarted f3b136f3-350b-49cc-90d0-e337e991f066
19:46:21.391 [error] Disposing session as kernel process died ExitCode: undefined, Reason: 

(this is the full, unabridged log output from the crashed session)
@dani-corie dani-corie added bug Indicates an unexpected problem or unintended behavior needs triage Issue has not been confirmed nor labeled labels Mar 23, 2025
@dani-corie
Copy link
Author

Here's a transcript of running the same in a python repl:

$ python
>>> import sparse
>>> a = sparse.COO(
...   [[1, 100, 215, 66],[5, 101, 242, 11],[3, 5, 1, 11],[13, 1, 3, 1],[55, 1, 6, 8]],
...   [5, 10, 2, 1],
...   shape=(255, 255, 255, 255, 255)
... )
>>> b = a.asformat('gcxs')
>>> b[1, :, :, :, :]
Killed
$
$ python --version
Python 3.13.1

@hameerabbasi
Copy link
Collaborator

Funny, I can't seem to reproduce this:

>>> import sparse
>>> a = sparse.COO(
...   [[1, 100, 215, 66],[5, 101, 242, 11],[3, 5, 1, 11],[13, 1, 3, 1],[55, 1, 6, 8]],
...   [5, 10, 2, 1],
...   shape=(255, 255, 255, 255, 255)
... )
>>> b = a.asformat('gcxs')
>>> b[1, :, :, :, :]
<GCXS: shape=(255, 255, 255, 255), dtype=int64, nnz=1, fill_value=0, compressed_axes=(np.int64(0),)>

I also tested with NUMBA_BOUNDSCHECK=1 to make sure no out-of-bounds accesses were happening. Can you provide more info?

@hameerabbasi hameerabbasi added need more info Indicates that an issue, pull request, or discussion needs more information and removed needs triage Issue has not been confirmed nor labeled labels Apr 9, 2025
@dani-corie
Copy link
Author

Sure, what can I give you so it's hopefully easier to reproduce?

It's a Conda-Forge venv, with Python 1.13, on Ubuntu 22.04 and an Intel 11th gen CPU. Python just quits, I'm suspecting an infinite loop of some kind.

@hameerabbasi
Copy link
Collaborator

That's counter intuitive, the algorithm is completely deterministic and would also lead to a timeout on my machine if there was an infinite loop.

@dani-corie
Copy link
Author

Okay so I did an experiment with python -vvv, and it didn't make me any smarter. Here's pip list in case it helps:

$ pip list
Package                   Version        Editable project location
------------------------- -------------- ---------------------------------
anyio                     4.8.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.4
attrs                     25.1.0
babel                     2.17.0
beautifulsoup4            4.13.3
bleach                    6.2.0
Brotli                    1.1.0
cached-property           1.5.2
certifi                   2025.1.31
cffi                      1.17.1
charset-normalizer        3.4.1
cleantext                 1.1.4
click                     8.1.8
colorama                  0.4.6
comm                      0.2.2
debugpy                   1.8.12
decorator                 5.1.1
defusedxml                0.7.1
exceptiongroup            1.2.2
executing                 2.1.0
fastjsonschema            2.21.1
fqdn                      1.5.1
h11                       0.14.0
h2                        4.2.0
hpack                     4.1.0
httpcore                  1.0.7
httpx                     0.28.1
hyperframe                6.1.0
idna                      3.10
importlib_metadata        8.6.1
importlib_resources       6.5.2
ipykernel                 6.29.5
ipython                   8.32.0
ipywidgets                8.1.5
isoduration               20.11.0
jedi                      0.19.2
Jinja2                    3.1.5
joblib                    1.4.2
json5                     0.10.0
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter                   1.1.1
jupyter_client            8.6.3
jupyter-console           6.6.3
jupyter_core              5.7.2
jupyter-events            0.12.0
jupyter-lsp               2.2.5
jupyter_server            2.15.0
jupyter_server_terminals  0.5.3
jupyterlab                4.3.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
llvmlite                  0.44.0
MarkupSafe                3.0.2
matplotlib-inline         0.1.7
mistune                   3.1.1
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest_asyncio              1.6.0
nltk                      3.9.1
notebook                  7.3.2
notebook_shim             0.2.4
numba                     0.61.0
numpy                     2.1.3
overrides                 7.7.0
packaging                 24.2
pandas                    2.2.3
pandocfilters             1.5.0
parso                     0.8.4
pexpect                   4.9.0
pickleshare               0.7.5
pip                       25.0
pkgutil_resolve_name      1.3.10
platformdirs              4.3.6
polars                    1.26.0
prometheus_client         0.21.1
prompt_toolkit            3.0.50
psutil                    6.1.1
ptyprocess                0.7.0
pure_eval                 0.2.3
pycparser                 2.22
Pygments                  2.19.1
PySocks                   1.7.1
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
pytz                      2024.1
PyYAML                    6.0.2
pyzmq                     26.2.1
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
rfc3339_validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.22.3
scipy                     1.15.1
Send2Trash                1.8.3
setuptools                75.8.0
six                       1.17.0
sniffio                   1.3.1
soupsieve                 2.5
sparse                    0.16.0b4       /home/dani/Workshop/vendor/sparse
stack_data                0.6.3
terminado                 0.18.1
tinycss2                  1.4.0
tomli                     2.2.1
tornado                   6.4.2
tqdm                      4.67.1
traitlets                 5.14.3
types-python-dateutil     2.9.0.20241206
typing_extensions         4.12.2
typing_utils              0.1.0
tzdata                    2025.2
uri-template              1.3.0
urllib3                   2.3.0
wcwidth                   0.2.13
webcolors                 24.11.1
webencodings              0.5.1
websocket-client          1.8.0
widgetsnbextension        4.0.13
zipp                      3.21.0
zstandard                 0.23.0

@dani-corie
Copy link
Author

Okay, updated my sparse version to current HEAD (sparse-0.16.1.dev3+ge669565), created a fresh conda env, and just did pip install -e . in the sparse source tree... Result is the same. Then I did a new conda env, this time with python 3.12, the same result.

Here's the test script in full:

import sparse

a = sparse.COO(
  [[1, 100, 215, 66],[5, 101, 242, 11],[3, 5, 1, 11],[13, 1, 3, 1],[55, 1, 6, 8]],
  [5, 10, 2, 1],
  shape=(255, 255, 255, 255, 255)
)
b = a.asformat('gcxs')

print("# # # # STARTING SLICE OPERATION # # # #")

c = b[1, :, :, :, :]

print("# # # # SLICE OPERATION COMPLETED # # # #")

print("success", len(c))

@hameerabbasi
Copy link
Collaborator

Hmm, the only deps we have are NumPy and Numba, and I couldn't replicate the issue with those versions. Let me try a Ubuntu 22.04 VM.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Apr 11, 2025

Hmm, you're right. It does happen on Ubuntu 22.04 with Intel emulation on my ARM Mac; so it's either x86-64 related or Linux/Ubuntu 22.04 related.

I'll look into this more, it's a curious bug indeed.

Edit: What's more, it happens with the Numba JIT entirely disabled, which means very likely this is a pure-NumPy bug you've hit on!

Edit 2: It exits almost immediately, it doesn't hang for me.

@hameerabbasi
Copy link
Collaborator

It seems to be Numba-related, I've reported it upstream at numba/numba#10040.

@dani-corie
Copy link
Author

What's more, it happens with the Numba JIT entirely disabled, which means very likely this is a pure-NumPy bug you've hit on!

how fun! 😇

thanks for looking into it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior need more info Indicates that an issue, pull request, or discussion needs more information upstream
Projects
None yet
Development

No branches or pull requests

2 participants